This is the first part of a X part article.
Part 2
The Outbreak — Detecting fake Viral News, automatically._Two weeks ago i published this post on medium about how we can detect fake viral news, using the Outbreak, a tool…_hackernoon.com
part 3
The Outbreak — How to detect the real viral posts compared to the one hour share spike._To get more information's about what is the Outbreak tool, read this Medium post:_medium.com
part 4
Understanding Facebook Reactions using Google Sentiment Analysis_2019update: New article out of the owen:_medium.com
First of all a question. How do you find out about a fake viral article ?
Does somebody tell you abou_t it ? Are you using a special software ? Do you use facebook trends and twitter trends to find out what is viral ?_
To find out how journalists find out in the present, i made this google form**.**I encourage you to complete the form so we can get a better understanding of how journalists and bloggers find out about viral news articles.
The Outbrake is a service that i want to developed for Journalists,allowing them to find viral articles before they go viral.
By automatic crawling the FB pages that usually share fake or misleading articles, journalists can see the “next lies” before they are already so popular so even if you will explain that something is not true, more people had heared the fake version.
Every article that now have 600K Shares, in the first hour had 5K shares, in the second 11K, the third hour 19K
Using this, we can detect this articles way before they spread from one vertical into another and they become viral.
Using a custom python web parser we crawl, every hour, a list of the top 1000 news websites in the US and add them in a database. The second time we parse the same link, we add it to the database and we calculate the difference in the number of likes, emotions and shares using the facebook API. We monitor each story for 3 days before we stop indexing that particular link. If the story resurface in our databse later, we will monitor again for 3 days before stopping again.
Only for around 5% of the posts, the posts that we see that are becoming viral, we download the comments so that we can analyze them and see what people are talking about.
The rationale is that we can automatize the process of learning what is this article about and the validity based on the comments the users post to a article.
1000 FB Pages * 24 hours = 24.000 request per Day. If the average size of the request is 1MB, per DAY need to download 24 GB from Facebook Per Month, this means 720GB
1000 news websites * 50 articles per day * 5 days (avg time a article will be crawled) = 250.000 request per hour. Per day this will mean 6.000.000 requests If the average size of the request is 100k, per HOUR i need to download 25 GB from Facebook. Per day 600GB. Per month, minimum bandwidth need of 18TB
Around 1-5% of all articles.
Per Day we will download around 10.000 Viral News/Top News articles Per month this mean 300.000 requests * 100 comment nested pages = 30.000.000 requests If the average size of the request is 1MB, per DAY we need to download 10GB from Facebook Per month, minimum bandwidth needed is 300GB This will be data that we will keep almost entirely, so it`s costly on the server side.
Payload per day of 1000 news websites * 50 articles per day = 50.000 request per DAY. If the average size of the request is 1MB, this means 50GB per day Per month, this means 1.5TB
Per month, we need to send over 200M API requests to facebook, downloading over 20TB of data
I will officialy kickstart this project at the second @Debug Politics Hackaton on the 9th to 11 of December in San Francisco.
But i`m working already on it.What i need is somebody with experience in database design, to create the back office architecture. Also, somebody that have experience in Design.
I need financing for pushing this project foward, for the server and other costs asociated with the project. If you want to be part in this project, either as a sponsor or as a developer, email me at [email protected]
I collaborate with Rise Project, were i do data analysis and pattern recognition to uncover patterns of data in unstructured datasets.
You can find me online on Medium Florin Badita, AngelList, Twitter , Linkedin, Openstreetmap, Github, Quora, Facebook
Sometimes i write on my blog http://florinbadita.com/