In the last couple of years, short videos have become the new darling of the digital mediascape. After the internet boom in India, many influencers are emerging daily. We all have our favorite creators and can spend hours watching their content. For a platform like ours, we needed a user-creator affinity recommendation model such that we recommend creator stories to users based on the affinity (likeability) factor where a consumer’s (user) likeability for a creator is defined by: Follow, Profile Visit, Like, Comment, Share, etc.
Affinity means a natural liking for and understanding of someone or something. Affinity is a temporal factor that changes with time and interest niche. Our goal is to capture user-creator affinity strength, which also captures users’ interest niche i.e., what type of stories a consumer (user) prefers more.
Recommend a list of story ids of creators for whom user-creator affinity is high.
NOTE: A creator is also a user on the platform. Hence, I will address users as consumers who watch a creator’s video.
Out of these different interactions b/w consumer-creator, we decided to pick profile visit as a stronger signal to map out similarity between creators.
We divided the problem into 2 parts:
In Summary*: First, we find out true high-affinity creators for a consumer based on MCDM. Then we find similar creators with respect to high-affinity creators.*
Step 1: Finding True Top Affinity creator for a Consumer (user) from interactions.
This is a multi-criteria decision-making (MCDM) or multi-criteria decision analysis (MCDA) problem as we wanted to rank all creators for a consumer (user) with whom the consumer interacted in the last 30 days.
Consumer-Creator Interaction Data
We use the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) an MCDM algorithm to rank creators in order of affinity. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the ideal solution and the longest geometric distance from the worst solution.
Scikit Criteria_:_ Link
One can check out my blogs to get a detailed understanding of MCDM: Ranking of entities with Multi-Criteria Decision Making Methods (MCDM) — Part One | Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two
Sum up: Post step1, for every consumer we have a ranked creator set based on affinity factors with whom the consumer has interacted in the last 30 days.
**Step2 & Step3: Creator Graph — Profile Visits to Embedding**We constructed a Creator-Creator graph based on the profile visits of a consumer. Connections between those creators were made for which profile visits by consumers co-occurred on a particular day.
The graph weights were defined by co-occurrence strength (number of times profile visits by a consumer co-occurred).
We computed the creator embeddings based on Paper Link Node2vec+ that uses word2vec skip-gram model.
Node2vec Params: How to set p and q?
The top and bottom panels correspond to the node2vec embedding generated using q = 0.5 and q = 2. One can see that in the top panel, nodes that fall into the same local network neighborhood (i.e., homophily) are colored the same. On the other hand, in the bottom panel, structurally equivalent nodes are colored the same.
Params q=0.5 and p=1 in this setting node2vec discover clusters/communities of characters that frequently interact with each other. Since the edge b/w nodes are based on co-appearances.
Sum up: Post Step2 & Step3, we now have creator embedding computed based on creator-creator graph build based on co-occurrence of profile visit.
**Step4: Recommending Top Creators**Now, we have true Consumer (User)-Creator Affinity Ranked based on MCDM and we have embeddings of all (active) creators on our platform.
We pick the top 5 True Affinity Creators ranked from the MCDM technique and recommend Nearest Neighbours to get the top 100 high-affinity creators.
Why top 5 True Affinity Creators were picked as query vectors? Why not pick the best top 1 or create a mean vector of top 5 creators and show similar creators to the query vectors in embedding space?
Idea of picking top 5 creators is inspired from Pinterest Research Paper PinnerSage.
It is true a user cannot be represented by one particular “interest” embedding.In general even in example of movies everyone shows interests in multiple genres likes honor, action, sci-fi, comedy, etc.To identify user interest we pick top 5 creators from the ranked set.
For vector similarity search we used Approximate Nearest Neighbour Algorithm (ANN) ScaNN over creator embeddings for fast vector similarity search.
Our expected outcome is a list of storyid. Hence, from the 100 top affinity creators for each consumer (from the above approach), we pick the latest not watched story of each creator and add it to the recommendation pool of the consumer, stories ranked based on creator similarity score wrt. user’s true creator affinity.
This approach of Topsis MCDM and Node2Vec+ not only ranks creators for a consumer but also helped us to find similarities between creators of the same niche using a profile-visit co-occurrence graph.
This newsletter is now read by more than 4500+ subscribers. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to [email protected] for more details on sponsorships.
I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data
Connect 1:1 Meeting here: https://topmate.io/shaurya
I am open to Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/
[1] The Book of Why
[2] Naked Statistics