paint-brush
ML: Telegram chat message classificationby@n1try
452 reads
452 reads

ML: Telegram chat message classification

by Ferdinand Mütsch5mFebruary 28th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

First of all, a short disclaimer: I’m not an expert in machine <a href="https://hackernoon.com/tagged/learning" target="_blank">learning</a> at all. In fact I’m in a rather early stage of the learning process, have basic knowledge and this project is kind of my first practical hands-on ML. I’ve done the <a href="https://www.youtube.com/user/UWCSE/playlists?sort=dd&amp;shelf_id=16&amp;view=50" target="_blank">machine learning course</a> by <a href="https://homes.cs.washington.edu/~pedrod/" target="_blank">Pedro Domingos</a> at University of Washington, <a href="https://www.udacity.com/course/intro-to-machine-learning--ud120" target="_blank">Intro to Machine Learning</a> by Udacity and <a href="https://hackernoon.com/tagged/google" target="_blank">Google</a> and the <a href="https://his.anthropomatik.kit.edu/english/28_315.php" target="_blank">Machine Learning 1 lecture at Karlsruhe Institute Of Technology</a>, all of which I can really recommend. After having gathered all that theoretical knowledge, I wanted to try something practical on my own. I decided to learn a simple classifier for chat messages from my <a href="https://telegram.com/" target="_blank">Telegram</a> messenger history. I wanted to learn a program that can, given a chat message, tell who the sender of that message is. I further got inspired after having read the papers related to Facebook’s <a href="https://github.com/facebookresearch/fastText" target="_blank">fastText</a> text classification algorithm. In their examples they classify Wikipedia abstracts / descriptions to <a href="https://dbpedia.org/" target="_blank">DBPedia</a> classes or news article headlines to their respective news categories, only based on plain, natural words. Basically these problems are very similar to mine, so I decided to give it a try. Since I found that many text classifiers are learned using the Naive Bayes algorithm (especially popular in spam detection and part of <a href="http://spamassassin.apache.org/" target="_blank">SpamAssassin</a>) and it’s really easy to understand, I decided to go for that one, too. Inspired by <a href="http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/" target="_blank">this article</a>, where the sentiment of tweets is analyzed, I chose to also use the <a href="http://www.nltk.org/" target="_blank">natural language toolkit</a> for Python. Another option would have been <a href="http://scikit-learn.org/" target="_blank">sklearn</a>, but NLTK also provided some useful utilities beyond the pure ML scope.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - ML: Telegram chat message classification
Ferdinand Mütsch HackerNoon profile picture
Ferdinand Mütsch

Ferdinand Mütsch

@n1try

L O A D I N G
. . . comments & more!

About Author

Ferdinand Mütsch HackerNoon profile picture
Ferdinand Mütsch@n1try

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite