Every day, we casually scroll through our Hackernoon, Twitter, Reddit or Facebook feeds without ever seeing the “hidden hand“ of our algorithmic overlords that are recommending us news, promotions and custom-tailored content on various topics of interest.
For example, Facebook puts more posts from your friends and family on your feed while other platform focus more on relevance of a post.
These algorithms have a lot of power in shaping our intuition about what is happening out there in the world, but can sometimes be unfair, biased or exploited, mostly to end-users disadvantage. We are not going to talk about ethics of such algorithms here, but rather describe the basic technical idea and challenges behind it. So let’s explore what it takes and how would we go about it?
A social media feed is comprised of the content its users are creating. All posts go to one giant global queue from where they are delivered to users’ feeds accordingly. Thousands of such posts will compete with one another to rise in the eyes of our algorithmic arbiter.
Once the algorithm has made its decision, only a few of those will make it to our news feed. This decision-making is called scoring and it’s taking certain parameters into consideration.
There are a lot of ways by which one can score a post depending on its textual, visual or audio content. Some posts can include videos or images for example requiring machine learning to distinguish NSFW or spam content from a normal post, but that is a different topic.
When it comes to quantity - the more users there are on one platform, the more content there is to score, filter and match. We need fast, natural way of keeping users feed relevant to them.
The challenge here is to make a scalable and efficient scoring algorithm that makes sense to the end user and delivers organic and authentic content.
In a perfect world, this would be done in a transparent and standardized manner, but these rules are usually hidden behind complicated metrics.
Let’s imagine a simple post and it’s structure, so we can examine it’s properties and pick the most relevant parameters for our purpose.
Here is an example structure that we can use to model our scoring formula:
We want our ranking algorithm to be GEO aware. This means that the primary feed of each user is his local country/language-based stream. These local streams are sometimes called buckets.
Further, we want to take some basic parameters from the post. Two key parameters are time of post creation and number of points. Time of creation is useful since we want to make feed chronological - showing from newest to oldest, going from top down. Number of points or likes informs us about user engagement, but this could be any other parameter of your choosing that you have access to or that is relevant for your use-case. In this example we will use number of points as a viral indicator.
Third parameter that we need to introduce is a general control parameter that we can use to tweak manually and do global overrides using something called gravity.
Here is what a scoring function would look like:
Score = (P-1) / (T+2)^G
where,
P = points of a post (-1 is just to negate submitter vote on it’s own post)
T = time since submission (in hours - this is important)
G = gravity
Gravity and time have a significant impact on the score.
Generally, these things should hold true:
It’s much easier to understand if we plot the algorithm visually. We can use Wolfram Alpha to make this plot and tweak it - you can do it live here
Using the formula above, in a span of 24 hours with 3 different posts, the plot shows us that the yellow one which has the most points (199) falls down the users feed much more slowly. Other posts start decaying much faster, but this is balanced by the time parameter in the formula, which gives us the nice curve that makes decay smooth.
As you can see the score decreases a lot as time goes by, for example, a 24-hour old post will have a very low score regardless of how many points it got making space for new posts and keeping the posts flowing through the feed in a natural way.
Here we tweaked only the gravity, while keeping the number of points the same for all 3 posts. As you can see on the graph the score decreases a lot faster the larger the gravity is.
The more gravity a post has - the faster it falls down the user’s feed.
Despite its weaknesses, this formula can be further developed and built on top to overcome them, but even in its simplest form it shows how using just a couple of parameters can sometimes be enough to steer and serve a large amount of content in a natural, organic way.
No algorithm is perfect, a good one is always evolving.
Lead photo by charlesdeluvio on Unsplash