Imagine reading something, and never losing track of that information. I usually find it difficult to remember what I read as time passes. As shown in the graph above, memory retention drops exponentially after the first few days. I try to take thorough notes, and look them over regularly, but I usually need a trigger event for me to revisit them. This is super unsustainable, and I’m sure this is the case for most people. Wouldn’t it be great if you could visit your highlights more regularly? In the graph above, the more you review something you’ve learned, the more it becomes a part of you. Motivation: I searched the internet for a passive way to re-read notes and found — a service that emails you your highlights everyday from various sources. Since I have been learning about object oriented functions of Python, and (and forgetting them mostly), I decided to put those skills to use and build a DIY version of the service for myself. Together, we’re going to build this application using Python (and its object oriented features). This application will make sure that anything you read (and highlight), gets presented to you on a regular basis so you never forget the material. Through spaced repetition, you can instill the notes in yourself. readwise.io software architecture design patterns Things this app does: Selects notes and highlights you’ve compiled from your dataset Sends an email with selected notes to a specified email account Emails on a user-defined schedule using Cron Let’s get started We’re going to need data. This is the most manual step of the entire process. I use PDF expert to read PDFs and it has a feature to export all annotations. I simply put these in an excel document, which I then convert to JSON (using a generic Excel to JSON service on the internet). See the sample JSON file below. Each block represents a highlight/note. # data
{ : [
        { : , : , : , : , : , : , : , : , : },
        { : , : , : , : , : , : , : , : , : }
]} JSON "Sheet1" "date_added" "May 12, 8:59 AM, by Ankush Garg" "source" "Book" "title" "Fundamentals of Software Architecture" "chapter" "N/A" "note" "N/A" "highlight" "The microkernel architecture style is a relatively simple monolithic architecture consisting of two architecture components: a core system and plug-in components." "page_number" "Page 165" "has_been_chosen_before" "0" "id" "48" "date_added" "Apr 12, 10:50 AM, by Ankush Garg" "source" "Book" "title" "Genetic Algorithms with Python" "chapter" "Chapter 4: Combinatorial Optimization - Search problems and combinatorial optimization" "note" "N/A" "highlight" "A search algorithm is focused on solving a problem through methodic evaluation of states and state transitions, aiming to find a path from the initial state to a desirable final (or goal) state. Typically, there is a cost or gain involved in every state transition, and the objective of the corresponding search algorithm is to find a path that minimizes the cost or maximizes the gain. Since the optimal path is one of many possible ones, this kind of search is related to combinatorial optimization, a topic that involves finding an optimal object from a finite, yet often extremely large, set of possible objects." "page_number" "Page 109" "has_been_chosen_before" "0" "id" "21" I’ll be using Pycharm to build this app. Let’s construct empty .py files in a project directory shown in the image below. Feel free to put these files in any folder you prefer. The main thing we’re going for is that each of these services will rely upon each-other for their inputs/outputs. Folder structure: They’ll take that data, transform it, and then do something with it. A very reasonable question at this point is why I decided to create 4 separate scripts for simply reading in the data, selecting some entries, and emailing those to a specified email account. The reason is . MODULARITY I want each of these services to do exactly what they're designed to, and nothing more. In the future, if I want to swap functionality out, I can do that easily because there's minimal dependency between each service.  I'll give an example: currently reads in the data file locally, but in the future as the dataset increases in volume, it may pull from data stored in S3. database.py Accommodating this change will require a massive overhaul throughout the application, but having a separate modular service with minimal dependency, allows for easily swapping big pieces of functionality at will. Let’s walk through each of the service files: 1. database.py json url = open(url) json_file:
        response = json.load(json_file) response import # Ended up using http://beautifytools.com/excel-to-json-converter.php to convert Excel to Json # URL where data is stored - local on my computer for now '/Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/data/data.json' : def read_json_data () with as return Database file is simple. It loads data that’s stored locally using the function. We now have access to the data in our application. read_json_data 2. selector_service.py numpy np database read_json_data count_now = int(item[ ])
    item[ ] = count_now + self.raw_response = read_json_data() self.sampled_object = self.sheet_name_to_sample_by = self.num_of_entries_to_sample = self.sampled_object = np.random.choice(self.raw_response[self.sheet_name_to_sample_by],
                                               self.num_of_entries_to_sample) note self.sampled_object:
            increment_has_chosen_before(note) self.sampled_object # This script reads in the data from S3 and selects highlights import as from import : def increment_has_chosen_before (item) 'has_been_chosen_before' 'has_been_chosen_before' 1 : class SelectorService : def __init__ (self) # Read in JSON data None 'Sheet1' 3 # Number of entries to select : def select_random_entries (self) # Randomly choose entries from the dataset # For each selection increment the field "has_been_chosen_before" # In the future can use probability to make selections to notes that haven't gotten selected for in return Selector Service has an attribute that relies on as we saw above, and is the returned response. Three entries are selected randomly in and stored in . read_json_data self.raw_response selected_random_entries self.sampled_object We have sampled the entries now and are ready to parse that content. 3. parse_content.py selector_service SelectorService self.sample_entries = SelectorService().select_random_entries()
        self.content = content = item_index range(len(self.sample_entries)):
            item = + self.sample_entries[item_index][ ]
            content = content + item + item = + self.sample_entries[item_index][ ]
            content = content + item + item = + self.sample_entries[item_index][ ]
            content = content + item + item = + self.sample_entries[item_index][ ]
            content = content + item + item = + self.sample_entries[item_index][ ]
            content = content + item + item = + self.sample_entries[item_index][ ]
            content = content + item + + + self.content = content self.content from import : class ContentParser : def __init__ (self) None : def parse_selected_entries (self) '' for in "DATE-ADDED: " 'date_added' "\n" "HIGHLIGHT: " 'highlight' "\n" "TITLE: " 'title' "\n" "CHAPTER: " 'chapter' "\n" "SOURCE: " 'source' "\n" "PAGE-NUMBER: " 'page_number' "\n" "------------" "\n" return class takes in random entries, stores them as a class attribute , and parses them in a format useful for emailing using method. ContentParser self.sample_entries parse_selected_entries is simply formatting the content for the email to be sent out in the next step. It looks complicated, but text formatting is all that’s happening. Parsed content can now be emailed. Parse_selected_entries 4. mail_service.py parse_content ContentParser smtplib email.message EmailMessage self.msg = EmailMessage()
        self.content = ContentParser().parse_selected_entries() self.msg[ ] = self.msg[ ] = self.msg[ ] = [ ] self.msg.set_content(self.content) smtplib.SMTP_SSL( , ) smtp:
            smtp.login( , ) smtp.send_message(self.msg) self.define_email_parameters()
        self.send_email() composed_email = MailerService()
    composed_email.run_mailer()


run_job() # This service emails whatever it gets back from Content Parser from import import from import : class MailerService : def __init__ (self) : def define_email_parameters (self) 'Subject' 'Your Highlights and Notes for today' 'From' "example@gmail.com" # your email 'To' "example@gmail.com" # recipient email : def send_email (self) with 'smtp.gmail.com' 465 as "example@gmail.com" 'password' # email account used for sending the email return True : def run_mailer (self) : def run_job () takes in parsed content by and stores it as class attribute. sets email parameters such as subject, to and from, and sends the email using method. MailerService ContentParser self.content Define_email_parameter send_mail Both methods are triggered by and the entire application is run by function at the very bottom. This sends out an email to a specified account. This is what the email looks like. run_mailer run_job Sample Email Congrats, you’ve made it this far!! One last thing is to run on a schedule. Let’s use for that. Cron is a long-running process that executes commands at specific dates and times, and can be used to schedule recurring tasks. mail_service.py Crontab In your Crontab, add the following code with your absolute paths: * * * /Users/ankushgarg/.pyenv/shims/python /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/mail_service.py >> /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/cron. >& 0 19 log 2 1 This script runs everyday at 7 PM PST. Check out for coming up with a schedule in a Cron format. https://crontab.guru/ You’re done! My call to action is for you to make it better. Some ideas to enhance this project and make it yours: Data preparation is mostly manual at the moment. You can automate that by parsing PDFs using Python. The email send out isn’t pretty currently. You can use HTML to change the content to look better. Use attribute to make the selection better. Currently the sampling is happening randomly with replacement. You can change it so that informs probabilistically which highlight to include next. has_been_chosen_before has_been_chosen_before Store your data on S3 and see if you can make it work. It’s a great exercise if you haven’t used S3 or any AWS service yet. Involve friends and send each-other your highlights. Use NLP to parse through text to come up with the context or summary or sentiment for each highlight, and include that in the dataset. Once you have the structure down, there’s so much you can do. If you do decide to enhance this app, reach out and let me know so I can get some ideas for improvement as well. If anything is unclear, let me know and I’d be happy to clarify. Cheers!

Get The Most Out Of Everything You Read Using Python

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

The Noonification: FTX: The Greatest Crypto Magic Trick in the World 🪄 (12/9/2022)

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

The Noonification: FTX: The Greatest Crypto Magic Trick in the World 🪄 (12/9/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps