Ugh! To think that this article started from a rant about my K-drama search results. SMH.
I’m thrilled that it did because it led me down the rabbit hole of Algorithms, Deep Personalization, Search optimization, and…wait for it…Netflix’s Research Library! That site is to my brain what a pâtisserie is to my eyeballs and belly. I digress, but seriously, if you haven’t checked it out, you should.
It is due to my curiosity that I decided to write this article regarding optimizing Netflix's ML ranker outcomes for a more efficient and accurate deep personalization experience of subscribers’ long and short-term viewing preferences — to address both Fetch and Explore intents. I found writing this article very interesting because it takes the business and customer angles, as well as the infrastructure, into consideration.
Netflix seems to focus more on home-page recommendations that enhance users’ pre-query experience and less on post-search experience. To some extent, this is great because if it were to be the other way around from the beginning, we’d all have dropped Netflix years ago. After all, the user experience (UX) would have been a hot mess!
According to Das et al. (2022) in their paper on
There’s been some recent
The way I see it, as Alex Ratner stated in a tweet, “…Real value from AI always hinges on solving the hard 20%- and this almost always requires an entirely different set of approaches.” thus, Netflix must start to look into the features that they don’t focus much on — based on the evidence, that’d be query-related recommendations and Search. What if the “different set of approaches” is simple?
Let’s be real: Netflix’s recommender systems are SUPREME (All Hail!), and the recommendations on the homepage are great a lot of the time, but it becomes more stressful when I need to find films within a genre or category that falls within Netflix’s “movie genome” (“This show is …” e.g., Swoonworthy) because the recommendations on the homepage are finite. From my research, which included an admittedly ‘leading’ and informal Twitter
As a subscriber, what do I want? I want to be able to find more movies that meet my specific taste so that my recommendations are more accurately tailored to my preferences.
What does Netflix want? According to their Research website, they want me, the subscriber, to spend less time searching and more time watching what I like.
So, what if I’m able to input search components once in a blue moon, and my filters are so detailed that the queries finetune the recommender systems to provide better in-session pre-query recommendations during Fetch and Explore user intent actions such as idly scrolling/hovering through recommendations lists and typing to search respectively?
For instance, this has happened to me too many times than I can count and apparently so many others experience the same;
Scenario: Uchenna prefers to watch Korean TV shows in the romance genre that are swoonworthy, comedy and heartfelt. It doesn’t mean that if Netflix recommends a Chinese TV show that is in the romance genre with genome tags for ‘Bittersweet, Heartfelt and Emotional’, she wouldn’t watch it. She most likely would or she would have to scroll for several minutes looking for something that meets her current mood and preference.
In my humble opinion, these are Netflix’s problems;
Firstly, some of its core metrics are extrinsically motivated, quite presumptive, and based on the premise that success is in the completion of the main goal, which is, watching something on the platform in video format. In a perfect product world with business KPIs at the forefront of decision-making, yes, this is right, but in a world with flawed humans with varying moods, preferences, and intents, this is a flawed and unempathetic approach — it does not take into account that I might enjoy watching something that was recommended to me, but it doesn’t mean that it is what I would usually go for or it’s what my mood calls for at the point of selection.
Because Netflix is in the growth stage of the product lifecycle, it seems the current metrics are based on customer acquisition and retention without much focus on customer satisfaction. However, the bottom line is that I’m generally happy, but my need at that moment hasn’t been met. In addition, in a scenario where I have to find films with labels that are close to what I have in mind, it’d take longer for me to find something, which then explains why most people are likely to watch something that’s been recommended to them by a friend instead of searching or always following the recommendations.
Secondly, another recurring complaint on the internet pertains to the tags. The genre tags are inconsistent and do not follow regular genres. In addition, the genome tags and categories seem to be inconsistent and disorganized as well. With inconsistent and scattered labeling of video content, the likelihood of the recommender systems producing pre-query and post-query recommendations that are exactly like the ‘previously watched’ content and/or the probability of accurately predicting the subscriber’s taste becomes diminished.
Thus, Netflix has a great recommender systems infrastructure, but the labels aren’t organized properly enough to help the customer decide as well as help the systems make fully optimized anticipatory decisions. For instance, last night, I started watching a TV show with genome tags ‘Swoonworthy, comedy, quirky,’ and there’s absolutely nothing comedic about this show. It’s quirky and romantic but definitely not comedy—wrong labeling.
Lastly, e-commerce, content-sharing, and video tech companies are beginning to optimize their Search function despite having great algorithms that predict content based on user preferences but Netflix’s search function barely even has any filters.
I found lots of comments online during my research, but one of them seems to encapsulate everything I know is wrong with Netflix.
Option 1: Optimization of Search Filters and Creation of ‘Saved Search Labels
Applying optimized search filters and member pre-selected ‘saved search’ results could help improve Netflix’s recommender systems’ outcomes for pre-query in-session and long-term recommendations. Using a saved search to establish a baseline for behavior and user interest. User interactions with the baseline recommendation results can then help predict user preferences within the confines of that saved search, e.g., a Saved search named Quirky K-Drama can show more related results as the user interacts with the recommendation results — ‘Because you saved Quirky K-Drama.’ Below is a design of the Advanced Search/Saved Search feature recommendation.
Option 2: Creating Expandable Pre-Query Recommendations Carousel Lists
‘Because you watched…’ and ‘Because you liked…’ rows/ recommendations are finite, which puts a limit on the subscriber’s explore intent. Currently, the recommender system provides a carousel row for each and then more carousel rows for movies with similar genome tags to what you’ve watched in the past. This creates quite a lot of clutter and diminishes the user’s experience because they have to scroll down to view films within each genome tag. Instead of creating too many lists/carousels that make the user continuously scroll downwards and sideways to find videos within mislabeled genome tags, Netflix can optimize the ‘Because you liked…’ and ‘Because you watched…’ carousels by opening them up to become expendable. This way, there’s not much of a break in the user’s exploratory actions, thus creating a better experience for them.
Option 3: Creating Genome Tags Ratings to Improve Labelling and Algorithmic Outcomes
I’m thinking out loud (on paper), but just keep up with me, and hopefully, this written soliloquy will be coherent enough for you to understand the point that I’m trying to make.
There’s no way that humans can possibly watch every video in the Netflix library to judge the genome tags and relabel them correctly. Besides, if this were possible, the labels might even be more disorganized because the perception of a video is, to some extent, subjective. Bringing that bias into the mix might make the data more inaccurate. But could there be a way for subscribers to rate a video’s genome tag accuracy? If this happens, the recommender systems can use the data to properly label the video content.
For instance, Grammarly does something similar with its text tone detector ratings. I believe these ratings feed into their algorithm that helps predict and detect the tone of text.
The drawback to this is that Netflix might have to introduce a more detailed rating system for videos, which is a feature that was discontinued many years ago. However, if there’s a way to introduce the genome tag rating system even for three months, I believe that predictions will be better. Open Beta testing provides the perfect environment and timeframe for Netflix to allow subscribers to help train the model for more accurate predictions.
Cost-Benefit Dilemma: It is vital to figure out whether the cost of implementing these ‘delightful’ feature recommendations can justify the required investment because, at the end of the day, Netflix is a business, and cost-benefit analysis is a given. There was not much Netflix-specific data to analyze; thus, the cost-benefit analysis outcome is unknown, and the feasibility of implementing these suggestions cannot be determined with any certainty.
Risk of Sampling Error: Qualitative research for this article was done using a random sampling approach on a very small cumulative sample size of about 30 people. Other feedback gathered from different online forums might also be indicative of a recurring customer pain point. However, Netflix has over 200 million subscribers; thus, even though it may seem like these are widespread customer complaints, they might not necessarily be representative of the broader subscriber population size as the small sample size comes with the probability of a sampling error (Phew! What a long sentence). Anyway, only Netflix’s data can truly confirm the validity and importance of these issues to their users. Regardless, Hypothesis testing and further research would be required to know if these recommendations are worth implementing. Without testing, implementing these would be a shot in the dark.
Potentially Erroneous Assumption of Priority: These suggestions are presumptive that the features haven’t already come up on the Netflix Product team’s backlog or brainstorming board, and they either haven’t been prioritized or have been canned because they aren’t feasible.
Reality-Hypothesis Scenario Divergence: Recommendation Option 3 assumes that genome tags are determined by Netflix’s AI, however, there's a possibility that in reality, genome tags are added manually during the uploading process e.g. Think: How labels and categories are added to products in a POS system. If that's the case, then we are back to square one — human error due to subjectivity — this might account for discrepancies. However, on second thought, despite this change in scenario, I believe that the validity and viability of the genome tags ratings recommendation (Option 3) could very well still stand. Just think about it for a second. Unfortunately, unless you work at Netflix, this information is unknown, so let’s just stick with our imagination and inferences, okay? :)
Netflix arguably has the best Recommender systems in the game; its subscriber growth rate is increasing at a healthy pace due to expansion strategies, and its UX is probably the best in the competitive landscape as well. I envisage that they will continue to hold on to the majority of the market share for at least the next five years.
However, what happens if they eventually manage to capture the streaming markets in most target countries? What happens when they can't compete on price with local and international competitors? What happens when, like the US, most of these markets mature, and customers in these countries begin to look for better customer experiences with their video streaming? What happens when innovation in other streaming service companies, like Hulu, AppleTV, and HBO, outpace Netflix's holy grail recommender systems?
Ultimately, in the words of Steven Van Belleghem
, “What if customers want more than excellent service?”
I'm sure the amazingly talented people of Netflix are already thinking about these potential issues. These are long-term questions, but I believe that in the meantime, it's time to tweak Netflix's recommender systems to harness its optimum potential for this much-loved service to provide much better customer satisfaction. Customer experience (CX) is currently Netflix’s strength; however, customer satisfaction is slipping. A great product can have amazing CX, but if the customers want more than just excellent service to stay satisfied, more will need to be done. This is where these recommendations come in.
According to Former VP of Product at Netflix, Gibson Biddle, the Netflix product team is not just focused on understanding the customer’s current wants and needs but instead on ‘inventing and delivering on unanticipated future needs.’ I also came across an article in
Source: Gibson Biddle (Former VP of Product at Netflix )
Therefore, typical metrics for customer satisfaction, such as NPS, might not capture the sentiments observed during my research. Focus groups and interviews are the best research methods for figuring out such “nagging-but-dull” pain points that customers have, as they don't cause churn per se. However, like a toothache, over time, the dull pain points eventually might be the reason for customers to migrate to a different service provider.
Thus, this is the reason why I strongly agree with Dixon et al. (2010) in their HBR article that states that Customer Satisfaction and Customer Retention/Loyalty are more likely to be captured using Customer Effort Score as opposed to Net Promoter Score or CSAT and that’s because you learn more when asking the main question in CES — “How much effort did it take the customer to achieve their desired goal.” I believe that if Netflix asks this question often, they will find the reason why customer experience (CX) is great but customer satisfaction (CS) and loyalty (especially in the US) are disproportionately lower, and the results would help deepen personalization efforts on the platform.
In the context of Netflix, Deep personalization stands at the intersection between Customer Satisfaction, Customer Experience, Customer Loyalty, and an optimized AI system. Supercharge Deep Personalization using features based on insights from research methods such as CES, the ML ranker becomes more efficient in determining user preferences. The hypotheses in this article address business problems and user problems and encourage revisiting the current product metrics at Netflix. It’s a win-win for all, right? Well, it’s up to you to decide.
In a nutshell, there’s still a lot that the Netflix ML/AI systems are capable of. There’s still a lot that Netflix can do better to acquire and retain subscribers if they ask the right questions and implement features that reduce the users’ effort, create a better experience, and enhance the efficiency of the recommender systems. Thus far, they have done a great job, but it can certainly be better. The question is whether the cost of enhancement is commensurate with the benefit of a more efficient platform.
What are your thoughts on this subject? Do the pain points resonate with you?
Also published here.