Dongjun Lee

@humanbrain.djlee

Personal Assistant Kino Part 4 — Smart Feed

The Kino is a project to know about myself through Quantified Self, automate things to repeat and improve the quality of life.
From : http://quantifiedself.com/

List of Kino series

In the last episode, we looked at the function of Task Master, which automatically records and reports on the tasks. Today I want to talk about Smart Feed.

RSS Feed

RSS refers to the web feed to be notified when a new article is posted using rss provided by many websites. I’m going to talk a little bit about RSS.

RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) is a type of web feed[2] which allows users to access updates to online content in a standardized, computer-readable format. 
— From Wiki

Basically, many websites offer RSS. And there are many services that use this. One of them is a service called Feedly. If you register sites that see frequently, you can check new articles easily. I was using this service well, but It wasn’t supporting all the functions I wanted.

Pocket

And another service I use is Pocket. What this service does is very simple.

When you find something you want to view later, put it in Pocket.
 — Pocket

If you have an article you’d like to read later, you can put it in the Pocket. I often put the article that looks interesting in Pocket. And then I read it and then I move it to Favorite category if it’s really good.

Smart Feed

I wanted to automate my pattern that check new articles, put them in Pocket, read carefully and move to favorite category. This is why Smart Feed function was created.

First, the RSS urls are required for this function. So, it can read the RSS and get a new article, you’ll be notified. So I made a awesome-feeds repository. I thought it would be convenient to use Git to manage RSS for my favorite websites, and I wanted to make awesome series with many good RSS.

Now that the RSS is ready, just let me know when the latest article is published!
I used feedparser here.

f = feedparser.parse(feed_url)
f.entries = sorted(
f.entries, key=lambda x: x.get("updated_parsed", 0), reverse=True
)
# get Latest Feed
noti_list = []
if feed_url in cache_data:
previous_update_date = arrow.get(cache_data[feed_url])
for e in f.entries:
e_updated_date = arrow.get(e.updated_parsed)
if e_updated_date > previous_update_date:
noti_list.append(self.__make_entry_tuple(category, e, feed_name))

Schedule functions can specified the function’s schedule as described in Part 2 Skill & Scheduler. Checking the feed every minute is big overhead. When I tested it, I felt that 20 minutes of interval was enough.

def __excute_feed_schedule(self, interval):
schedule.every(interval).minutes.do(
self.__run_threaded,
self.function_runner,
{
"repeat": True,
"func_name": "feed_notify",
"params": {},
"day_of_week": [0],
"not_holiday": False,
},
)

Now Kino can notify the latest RSS feed. It’s already useful, but there was a function that I wanted to go further. Automatically save article I already trust to put it in my Pocket!

It need to connect with Pocket, and a using simple classification algorithm can make it smarter. The most important thing in machine learning is data. These data can be created with the raw logs. First, you can view all of the text that notify you with Feed function as the entire data. If only the text stored in the Pocket is given a value of 1, the entire data is divided into the text of interest / the article not interested. In addition, if you give the category of the article or name of website as information, you can create a simple but useful Decision Tree.

Decision Tree From http://ccg.doc.gold.ac.uk/

For example, when a new article is published on the Google AI Blog website, if I’ve seen a total of five of these things, and if I’ve saved four of them in Pocket, it’s also can view as something to be interested.

You can use Decision Tree very easily with scikit-learn.

class FeedClassifier:
def __init__(self):
train_X = FeedData().train_X
train_y = FeedData().train_y

model = tree.DecisionTreeClassifier()
model.fit(train_X, train_y) # Training
self.clf = model
    def predict(self, link, category):
result = self.clf.predict(category_id)[0]
if result == FeedDataLoader.TRUE_LABEL:
...
else:
...

Online Learning

The next important thing is online learning. The rss feed I put in the Pocket will change at that time. In response, the model must also detect these changes and make judgements with the latest information. The method used is online learning.

keep models up to date by continuously applying new data to models

Kino’s Smart Feed is getting smarter through this way. Online learning is possible by creating a cycle like below.

  1. Logging:
    All data on feed notified and with among which feeds put in Pocket
  2. Data Processing: 
    Parse the log to process it with categories, titles, dates, links, etc. 
    and add labels. (0: Do not put in Pocket / 1: Put in Pocket)
  3. Model:
    Fit the prepared data to the model. (Training)
  4. Predict: 
    Using the trained model, the new feed is determined whether or not to be put in the Pocket. Then, Feedback is provided for the wrong prediction of the model, so that the correct labels are stored.

If learning in real time is a bottleneck here, it could be a way to have it re-learned once a day.

Conclusion

This is the Smart Feed feature. Very simple, but really useful functions.
Now, more sophisticated predictions because it’s based simply on count that put in Pocket.
In the future, I’m going to try to predict if it’s a feed that I’m interested in, as a Text Classification problem, by title or introduction. In terms of Text Summarization, it might be able to create a summary of Feed for me in a hurry. I think the potential for the development of Smart Feed function is open. Let’s collect a lot of data and replace it with a Deep Learning model!

All code can be found here
Anyone who helps make Kino smarter is always welcome :)

More by Dongjun Lee

Topics of interest

More Related Stories