What the f*ck is a Bayes Classifier? The Bayes Rule, "The theory that never died" A simple 17th-century theory for the evaluation of knowledge, criticized for most of the 20th century. It helps people evaluate their initial ideas, and update and modify them with new information, in order to make better decisions. Initial beliefs + recent objective data = A new improved belief. The theory is very robust. In practice, the Bayes rule requires multiple calculations and powerful computers that reintegrate millions of times the probability of an initial belief each time new information arrives. Bayes rule does not generate an absolutely true (exact answer), instead, it uses probability to move step by step toward the most likely conclusion. It was discovered and published by two clergymen and amateur mathematicians, the Englishman Thomas Bayes and his Welsh friend Richard Price, during the 18th century. The French mathematician Pierre-Simon Laplace developed it in the form in which it is used today. Now we could call it the Bayes-Price-Laplace theory, or GLP for short. Part of the initial controversy is due to the fact that during the 40's of the 18th century a harsh controversy had opened about the improbability of Christian miracles. The question was whether there was evidence in the natural world that would help us reach rational conclusions about God the creator, which in the 18th century was known as "the cause" or "the first cause". We do not know if Bayes was trying to prove the existence of God as a cause. But we do know that he tried to deal mathematically with the problem of cause and effect; During the Cold War, the United States Air Force lost a hydrogen bomb off the coast of Palomares, and the United States Navy began to secretly develop the Bayesian theory for finding underwater objects. In 2009, Air France Flight 447 disappeared in the South Atlantic Ocean with 228 people on board. The United States Navy had developed Bayesian search theory enough to end two years of unsuccessful search for AF447 in a week's underwater search. Today we use it for an immense amount of things, such as filtering spam and training autopilot systems. If you want to know more about the history behind it, take a look at . this video Ok, nice. But you haven't answered the previous question! Naive Bayes Classification The Naive Bayes classifier is a machine learning technique that can be used to classify objects such as text documents into two or more classes. A new object it's classified by the similarity between others. Despite its "naivete", the naive Bayes method tends to work very well in practice. “all models are wrong, but some are useful” - George Box Cool, go for it! So, for our purpose (spam filtering), we need data (a lot of emails), classify each email as spam or ham (legitimate) and then analyze the words independently, in order to get the most common words in each of those classes. P.S: You can have many classes as you want. Like Gmail does with Promotions, Updates, Forums or Social. For simplicity, let say that the half of emails we get are spam. 1:1. Initial beliefs: Now we need to obtain the probability that each of those words appears in spam or ham. The simplest way is to count how many times each word appears in the data and divide the number by the total word count. word spam ham Free 184 12 total 104342 294554 In this example, the probability that the word "Free" in a spam message appears is 1 out of 567 words. Same exercise for ham: 1 in 24 If we found the word "Free" in our message, it will increase the probability of being spam by 23.6 (567/24) Recent objective data: 1/1 (initial beliefs: half of messages we receive are spam) * 23.6 (we just found the world "Free" in our email) = 23.6. There are on average about 23.6 spam messages for each ham message, or to use whole numbers, 236 (23.6*10) spam messages for every 10 ham messages. So, the probability would be: 236 / (236 + 10) * 100 = 95.9% To handle the rest of the words in a email, we can use exactly the same procedure. The posterior odds after one word (what we just calculated), will become the prior odds (or the ) for the next word, and so on. initial belief You may have noticed how the whole thing will be biased. Since once we are going to analyze the second word, it already has a strong belief that the email is spam, due it has the free world on it, and the same with the next words. That's known as the base rate fallacy, and you can read more about . here But let's keep it simple, "Land-and-Expand" How can I play with this? I just found , a gem that keeps it pretty simple. And dataset. classifier-reborn this Here you go: require 'classifier-reborn'
require 'csv'

# Load dataset
dataset = CSV.parse(File.read("spam_ham_dataset.csv"), headers: true)

# Create our Bayes / LSI classifier
classifier = ClassifierReborn::Bayes.new('Spam', 'Ham')

# Train the classifier
dataset[1..-1].each_with_index do |email, index|
  if dataset[index]["label"] == "spam"
    classifier.train "Spam", dataset[index]["text"]
  else
    classifier.train "Ham", dataset[index]["text"]
  end
end

# Play with it
puts "Insert your email here (txt)"
puts classifier.classify gets You can also check , to see how it looks without the gem :p this repo Speaking of spam... I'm going to launch a job board for devs that want to work remotely, and i'm looking for feedback, ! take a look Bye.

YouTube

How to Use Hotwire in Rails 7 to Build a Real Time Notification System

Spam Filtering System - Bayes Classifier

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Building a Recommendation Engine using Ruby on Rails

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Ways AI Has Changed Our Lives

Building a Recommendation Engine using Ruby on Rails

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

100+ Free Pluralsight Courses to learn Python, Java, and Spring Boot

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Ways AI Has Changed Our Lives

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps