paint-brush
Build a naive Article Spell-checker in 10 Lines of Python Codeby@amrwrites
9,591 reads
9,591 reads

Build a naive Article Spell-checker in 10 Lines of Python Code

by AMRJune 5th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

<em>For lost beginners who’re struggling to find a dope project — here’s an easy, but effective project to add to your portfolio!</em>

Company Mentioned

Mention Thumbnail

Coins Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Build a naive Article Spell-checker in 10 Lines of Python Code
AMR HackerNoon profile picture

For lost beginners who’re struggling to find a dope project — here’s an easy, but effective project to add to your portfolio!

Hello Beginner, Always start your project with a goal of solving a problem (Sorry, this isn’t typical Silicon Valley problem solving mindset, but a defining clear goal of what to do next)

Our Objective: To create an article Spell-checker that can list down mis-spell words in it.

Language: Python (Beginner-friendly Language helps any beginner to do magic with its rich ecosystem of packages)

Algorithm/Flow:

  1. Input: Define url (link) from whose spelling must be checked
  2. Extract: Tokenize (split the complete article into bag of words)
  3. Match: Cross-validate the extracted words against English dictionary words
  4. Output: List down the words that didn’t match (those are mis-spelt / non-dictionary words)

Python Packages/Modules in use:




import newspaperfrom nltk import word_tokenizeimport enchantimport re




Newspaper — for article summarizeNLTK — for tokenizing the summarized contentEnchant — for English DictionaryRe — for regular expression / pattern matching

*Below is how you can install the packages if you don’t have already:

pip install newspaper

Input:

url = 'https://hackernoon.com/dilemmas-of-a-digital-lifestyle-27c044940157'

Extracting the Article Content



my_article = newspaper.Article(url,language='en')my_article.download()my_article.parse()

Printing the parsed Article Text

print(my_article.text)

Dilemmas of a Digital LifestyleIs two blueteeth one bluetooth too many?A few weeks ago, I finally traded in my old car for a newer model. Among other things, the new car came with a decent audio system that had four speakers, and bluetooth connectivity.After a bit of fiddling, I was able to connect my phone via Bluetooth to the car’s system. I was late to the party, but it was still quite a thrill to take handsfree calls while driving, and enjoy the luxury of listening to the caller’s voice over the car’s stereo system. However the Bluetooth only connects to one device at a time. And that can be an issue if you have kids at home.My 14 year old and all her friends have their own personal collections of songs on their phones or iPods. And they are all dying to try out their songs on the car’s music system every time I cart them around.I didn’t want them using the car’s Bluetooth as that would mean losing my handsfree calls, and going through the whole circus of disconnecting and reconnecting my phone. The car does have USB and auxiliary ports but the first can’t connect to the phones, and the second needs a cable that has to be extra long to reach the kids in the back seat, which can be kind of messy.My solution was a little Bluetooth adapter that I picked up on eBay for around ₹300 ($5). One end has a USB connector, and the other an auxiliary cable. You power it by plugging it in to the car’s USB port. You then pair the bluetooth adapter with the phone, after which it pipes music played on the phone into car’s audio system, via the car’s auxiliary port. It’s simpler than it sounds, and it worked seamlessly, even temporarily turning off the music whenever a call comes on the car’s bluetooth.But it felt a bit weird to sitting there in my tiny car, with two active Bluetooth connections. Deep down, I couldn’t help worrying whether all those electromagnetic Bluetooth waves madly bouncing around inside the metal car were cooking up our brains.Yes, most of believe Bluetooth is harmless as everyone’s using it with seemingly no ill-effects. But then everyone used to happily smoke not so long ago, and nearly everyone is still swilling down tonnes of sugar without a care, in cokes, cakes and almost every other packaged food.What wouldn’t I give to travel twenty years into the future, and see science’s verdict on the effects of Bluetooth. But there’s only so much a Dad can do.As of now, the kids have their music, I have my phone, and all’s well in the world.

Spell-Checking the tokenized words



d = enchant.Dict("en_US")non_dict_words = list(set([word.encode('ascii', 'ignore') for word in word_tokenize(my_article.text) if d.check(word) is False and re.match('^[a-zA-Z ]*$',word)] ))non_dict_words

Output:

['USB', 'Bluetooth', 'bluetooth', 'blueteeth', 'eBay', 'handsfree']

Full Code:

Here’s the full code hosted on github:

Tada! You’ve just got yourself started with a cool project to pat yourself on your back. Don’t you love coding now?