Background A year ago I began learning about Developmental Psychology and Trauma which tied together a number of ideas that I was becoming familiar with including Attachment Theory, Polyvagal Theory, Internal Family Systems Theory, EMDR, various traditional practices becoming increasingly popular in a therapeutic setting. After having read a dozen books that bring together these themes, I felt that I had unearthed the mysteries of my own psychology (after a decades long quest) and became inspired by a common thread which binds them. My intention was to organize the information found in these books in a cohesive fashion so could write about this idea. But I quickly realized it could take me as long as a year at the rate I was going and wanted tools to simplify the process of organizing this body of work into a cohesive framework. Over the next six months, I immersed myself in the world of Large Language Models (LLMs). I explored various models, discovering which ones were best suited for my specific task. Through careful fine-tuning, I worked towards achieving production-quality consistency in the results. The outcome of this effort is a powerful content curation tool that has transformed my workflow. It not only accelerates my learning process but also empowers me to share knowledge more readily, without the need for extensive manual content creation. A different way to chat with PDF While my current focus is on eBook summaries, this project represents a fundamental shift in how we can interact with PDFs and other document formats. The conventional approach to working with documents typically involves chunking them and inserting them into a Retrieval-Augmented Generation (RAG) enabled database. This method allows an LLM to search documents and answer queries based on its findings. However, this approach often lacks precision and comprehensiveness. My method, while similar in some aspects, introduces a crucial difference. I pay meticulous attention to the chunking process, ensuring that documents are divided according to their inherent structure – respecting chapter boundaries. This preserves the logical flow and context of the original material. From there, I chunk each chapter individually and direct my queries to specific sections of the document. This targeted approach yields more accurate and precise knowledge of each subsection within a document. Mistral 7b Instruct v0.2 - Bulleted Notes To achieve consistent, high-quality summaries in a standardized format, I fine-tuned the Mistral 7b Instruct v0.2 model. This custom model specializes in creating bulleted note summaries. You can find the base model, GGUF, and LoRA versions in this Hugging Face collection. cognitivetech/Mistral-7B-Inst-0.2-Bulleted-Notes
cognitivetech/Mistral-7b-Inst-0.2-Bulleted-Notes_GGUF
cognitivetech/Mistral-7B-Inst-0.2_Bulleted-Notes_LoRA Models available on Ollama.com Mistral 7b Instruct v0.2 Bulleted Notes quants of various sizes are available, along with Mistral 7b Instruct v0.3 GGUF loaded with template and instructions for creating the sub-title's of our chunked chapters. obook_summary
obook_title Ollama eBook Summary: Bringing It All Together To streamline the entire process, I've developed a Python-based tool that automates the division, chunking, and bulleted note summarization of EPUB and PDF files with embedded ToC metadata. While PDFs currently require a built-in clickable ToC to function properly, EPUBs tend to be more forgiving. You can explore and contribute to this project on GitHub: ollama-ebook-summary. Beyond Summaries: Arbitrary Queries Once a book is split into manageable chunks, we create a bulleted note summary for each section. The end result is a markdown document that distills even a 1000-page book into content that can be reviewed in just a couple of hours. But the possibilities don't end there. Once chunked, you can pose arbitrary questions to the document. For instance, asking "What questions does this text answer?" or "What arguments does this text make?" can quickly reveal the core ideas of a research paper or book chapter. This feature is particularly valuable when reviewing numerous research papers. By asking targeted questions, you can swiftly filter out less relevant materials and focus on the most pertinent information for your needs. Looking Ahead: Future Developments As we continue to refine and expand this tool, we're exploring new chunking methods for various file types, including Markdown, raw PDF, raw TXT, Word documents, and additional eBook formats. Whether you're a developer, researcher, or enthusiast, your input can help shape the future of this project. Stay tuned for the upcoming launch of our paid web application, which will bring these powerful features to a wider audience. I hope you'll find this tool as invaluable as I do. The eBook summary tool can transform how you interact with and extract knowledge from documents. We invite you to try it out, contribute to its development, and join in revolutionizing the way we interact with and reason around knowledge in the digital age. You can explore and contribute to this project on GitHub: cognitivetech/ollama-ebook-summary Background PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables “Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days.”
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models  (2024-02-19; Mosh Levy, Alon Jacoby, Yoav Goldberg) “Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum” Background A year ago I began learning about Developmental Psychology and Trauma which tied together a number of ideas that I was becoming familiar with including Attachment Theory, Polyvagal Theory, Internal Family Systems Theory, EMDR, various traditional practices becoming increasingly popular in a therapeutic setting. After having read a dozen books that bring together these themes, I felt that I had unearthed the mysteries of my own psychology (after a decades long quest) and became inspired by a common thread which binds them. My intention was to organize the information found in these books in a cohesive fashion so could write about this idea. But I quickly realized it could take me as long as a year at the rate I was going and wanted tools to simplify the process of organizing this body of work into a cohesive framework. Over the next six months, I immersed myself in the world of Large Language Models (LLMs). I explored various models, discovering which ones were best suited for my specific task. Through careful fine-tuning, I worked towards achieving production-quality consistency in the results. The outcome of this effort is a powerful content curation tool that has transformed my workflow. It not only accelerates my learning process but also empowers me to share knowledge more readily, without the need for extensive manual content creation. A different way to chat with PDF While my current focus is on eBook summaries, this project represents a fundamental shift in how we can interact with PDFs and other document formats. The conventional approach to working with documents typically involves chunking them and inserting them into a Retrieval-Augmented Generation (RAG) enabled database. This method allows an LLM to search documents and answer queries based on its findings. However, this approach often lacks precision and comprehensiveness. My method, while similar in some aspects, introduces a crucial difference. I pay meticulous attention to the chunking process, ensuring that documents are divided according to their inherent structure – respecting chapter boundaries. This preserves the logical flow and context of the original material. From there, I chunk each chapter individually and direct my queries to specific sections of the document. This targeted approach yields more accurate and precise knowledge of each subsection within a document. ensuring that documents are divided according to their inherent structure ensuring that documents are divided according to their inherent structure Mistral 7b Instruct v0.2 - Bulleted Notes Mistral 7b Instruct v0.2 - Bulleted Notes To achieve consistent, high-quality summaries in a standardized format, I fine-tuned the Mistral 7b Instruct v0.2 model. This custom model specializes in creating bulleted note summaries. You can find the base model, GGUF, and LoRA versions in this Hugging Face collection. cognitivetech/Mistral-7B-Inst-0.2-Bulleted-Notes cognitivetech/Mistral-7b-Inst-0.2-Bulleted-Notes_GGUF cognitivetech/Mistral-7B-Inst-0.2_Bulleted-Notes_LoRA cognitivetech/Mistral-7B-Inst-0.2-Bulleted-Notes cognitivetech/Mistral-7B-Inst-0.2-Bulleted-Notes cognitivetech/Mistral-7b-Inst-0.2-Bulleted-Notes_GGUF cognitivetech/Mistral-7b-Inst-0.2-Bulleted-Notes_GGUF cognitivetech/Mistral-7B-Inst-0.2_Bulleted-Notes_LoRA cognitivetech/Mistral-7B-Inst-0.2_Bulleted-Notes_LoRA Models available on Ollama.com Ollama.com Mistral 7b Instruct v0.2 Bulleted Notes quants of various sizes are available, along with Mistral 7b Instruct v0.3 GGUF loaded with template and instructions for creating the sub-title's of our chunked chapters. obook_summary obook_title obook_summary obook_summary obook_title obook_title Ollama eBook Summary: Bringing It All Together To streamline the entire process, I've developed a Python-based tool that automates the division, chunking, and bulleted note summarization of EPUB and PDF files with embedded ToC metadata. While PDFs currently require a built-in clickable ToC to function properly, EPUBs tend to be more forgiving. You can explore and contribute to this project on GitHub: ollama-ebook-summary. You can explore and contribute to this project on GitHub: ollama-ebook-summary . ollama-ebook-summary Beyond Summaries: Arbitrary Queries Once a book is split into manageable chunks, we create a bulleted note summary for each section. The end result is a markdown document that distills even a 1000-page book into content that can be reviewed in just a couple of hours. But the possibilities don't end there. Once chunked, you can pose arbitrary questions to the document. For instance, asking "What questions does this text answer?" or "What arguments does this text make?" can quickly reveal the core ideas of a research paper or book chapter. This feature is particularly valuable when reviewing numerous research papers. By asking targeted questions, you can swiftly filter out less relevant materials and focus on the most pertinent information for your needs. Looking Ahead: Future Developments As we continue to refine and expand this tool, we're exploring new chunking methods for various file types, including Markdown, raw PDF, raw TXT, Word documents, and additional eBook formats. Whether you're a developer, researcher, or enthusiast, your input can help shape the future of this project. Stay tuned for the upcoming launch of our paid web application, which will bring these powerful features to a wider audience. Stay tuned for the upcoming launch of our paid web application, which will bring these powerful features to a wider audience. I hope you'll find this tool as invaluable as I do. The eBook summary tool can transform how you interact with and extract knowledge from documents. We invite you to try it out, contribute to its development, and join in revolutionizing the way we interact with and reason around knowledge in the digital age. You can explore and contribute to this project on GitHub: cognitivetech/ollama-ebook-summary cognitivetech/ollama-ebook-summary Background PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables “Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days.” Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models  (2024-02-19; Mosh Levy, Alon Jacoby, Yoav Goldberg) “Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum” PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables “ Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days. ” PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables Instead of spending weeks per summary, I completed my first 9 book summaries in only 10 days. Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (2024-02-19; Mosh Levy, Alon Jacoby, Yoav Goldberg) “ Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum ” Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum

This post provides insights into new product. 

Bulleted Notes eBook Summary: A Different Way to Chat with PDF

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables

PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables

How to Become a Successful Published eBook Author

How to Create a Simple Flask Application for Book Search

Beep Beep Bop Bop: How to Deploy Multiple AI Agents Using Local LLMs

How to Run Your Own Local LLM: Updated for 2024 - Version 1

PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables

PrivateGPT for Book Summarization: Testing and Ranking Configuration Variables

How to Become a Successful Published eBook Author

How to Create a Simple Flask Application for Book Search

Beep Beep Bop Bop: How to Deploy Multiple AI Agents Using Local LLMs

How to Run Your Own Local LLM: Updated for 2024 - Version 1

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps