Himanshu Khanna

@sparklinguy

The Psychology of Rating Systems

Uber, like other consumer services, has an interesting rating system, both for passengers and drivers. Arjun champions this passenger rating system with a score of 4.91, not because he strategizes but because he’s genuinely nice and loves to strike a conversation!

He recently noticed one of the drivers throwing him a 1-star rating. Arjun was taken aback. Out of curiosity, he asked the driver the reason for such a low rating. The driver smiled and explained — “Sir, this is not low. I gave you a №1 rating!”.

Shall we ditch the stars?

Earlier in 2017, Netflix made a big move — ditching their star rating, replacing it with a thumbs-up and thumbs-down rating system. YouTube had taken a similar turn almost a decade ago, in 2009. When it comes to ratings, it’s pretty much all or nothing.

Their conclusive statement was that an overwhelming majority of videos on YouTube had a stellar five-star rating, which meant that user reacted in extremes — either when something is exceptionally amazing or utterly pathetic. For the rest, they didn’t care to react and rate. A point to note here is that the average rating for a video on YouTube was common to all the users. Five red stars on Netflix meant the movie or series is a perfect match for you. The rating you saw next to each movie or series (on Netflix) was an average of like-minded users, not all the users as with YouTube.

How often do people rate?

Uber’s earlier experience had almost made it mandatory to rate your driver before you move to the next ride. The latest experience has made it optional, with users missing out on rating quite a few times. Uber (San Francisco) sent a guide to its drivers in 2014 that explained how the driver-rating system works. Also, if a driver’s rating was 4.6 or lower, Uber could consider deactivating their account.

“Deactivating the accounts of the drivers who provide consistently poor experiences ensures that Uber continues to be known for quality.”

Uber drivers depend on a good rating to make a living. And so do restaurants, to get better footfalls, amongst many other products and services that are ‘ratings’ dependent for better earnings.

Do we all understand ratings the same way?

Perhaps not. A more informed approach to answer this is to understand the intent of rating systems, specially in the case of today’s digital products.

What is a Rating System in the digital world?

A Rating System is a possible investment, from your users in your product (digital or otherwise). Once they (your users) are clear about and in favour of the return on this investment, they are more likely to invest, engage better with your rating system. They will help with the product’s growth by rewarding good actions and punishing the not so good ones.

We recently conducted a related survey for one of our client’s project. More than 50 percent of the respondents take 7 as the lowest rating on IMDB, for them to watch that movie. Anything less would make them skip the movie! Interestingly, the LAR (lowest acceptable rating) for a show or series with the same set of respondents, is 8 (and not 7, like movies). “I almost never rate a movie more than 8, given it’s an ideal movie” — one of the respondents revealed. While a 7 rating is acceptable, 9 is too rare and exceptional for movies. The same set of users would vote the same movie as ‘thumbs up’, on Netflix.

Platforms, their usage, rating systems and their eventual participative results seem to play a great psychological role in how a user rates a content piece. Colors, labels and the immediate effect of the rating are also major influencers.

About 80 percent of the respondents from the same survey claim to hire an uber at least once a week, with more than 30 percent boarding one everyday. Almost 50 percent of these Uber commuters will cancel their ride if the driver has a rating anything less than 4.5 (out of 5)! The same users browse the listings on Zomato for a minimum of once a month. The LAR for a restaurant drops to 3.8 (out of 5)! Perhaps, the frequency of usage and relevance to everyday life are also influencing our perception of ratings.

“A 4-star rating is for working as expected. 5 is for exceeding expectations!”

Instagram (or Facebook) likes and Twitter’s retweets are rating systems as well, the true binary ones in fact. Usually, users don’t rate ‘blah’ experiences. They rate extremes — love or hate! Facebook, Twitter and Instagram do not care about the ‘hate’ inducing experiences. They value and popularize the experiences users ‘love’. Binary systems ease the act of getting ratings, for users have to act only when they love something on these platforms.

Group Norms and Conformity

Muzafer Sherif conducted a classic experiment in 1936. Participants were placed in a dark room and asked to observe a small dot of light 15 feet away and estimate the amount it moved. The experiment showed that the participants tested individually reported wider variations of light movement than the participants tested in the group of three, converging to a common estimate. Sherif’s experiment showed that rather than make individual judgments, people would always tend to conform to a group.

An Instagram user is shown a certain 9gag post has 5,59,031 likes and the user conforms by adding another like. Conformation holds true for ratings, stars, votes and other systems, if the sum (or average) of all the ratings is shown before the act of rating. Perhaps, this factor is one of the key contributors to the concept of virality on Social Media.

Clap to rate

Medium’s binary equivalent of a ‘like’ button, ‘recommend’ (the heart icon), was reinvented somewhere around the middle of 2017, to a ‘Clap’. This radical move interestingly, transforms a reader from being an appreciator to an evaluator. A user may clap as many as 50 times for an article, with 0–50 claps evaluating its likability (or quality…), an equivalent of the star rating system perhaps.

While you may ‘like’ your own posts on Facebook, Medium doesn’t want you clapping for self

One intriguing difference to note, with the star rating and Medium’s clap count, is the visual unavailability of the possible counts. A star rating asks one to evaluate on a score of 5. Whereas the clap count asks one to evaluate on a score of infinity, virtual of course. Many fear that this rating system change in Medium is going to inflate the ‘rating-currency’ of the platform. For an article that fetched about 2k recommendations, even 20k claps seem less now. Though if conformation continues to work its magic, there is a better possibility of securing another clap, when the number of claps read 20k and not 2k.

Delving deeper!

Some of the rating systems suffer from a tad overlooked fallacy — the Concept of Averaging. For instance, beyond a critical number of ratings, a 5 star rating for a service (or a product) would average out to a certain number, let’s say 4.3. In most scenarios from hereon, given the ratings continue to pour in good numbers, it will require a considerable amount of extreme ratings (1 or 5) to move 4.3 to 4.4 or 4.2. The 5 star rating of 4.3, after N number of ratings, becomes the average rating! Anything amazing or pathetic from this point on, might not affect this rating considerably, preventing the real feedback from being visible.

Uber driver app’s user experience makes it compulsory for a driver to rate each passenger as soon as the trip ends. The same experience is not true for the passenger app — it’s optional for a passenger to rate a driver. Similarly, Zomato and Amazon have made it optional for users to review after their purchase. In fact, Amazon allows one to review a product even if it has not been purchased from the platform, with the LAR (lowest acceptable rating) getting reduced to as less as 2 for a sought after product.

“I know a lot of cuckoos also weigh in reviews on Amazon, so I have on occasion bought stuff that is 2 or 3 star rated, and I have been very happy with it.”

User’s personality, mood, environment, urgency of the requirement, eventual gratification (and its notional value) and the influencers, with their proximity to the user, all of these factors weigh in quite a lot in how that user rates something. Some of the other notable observations from our survey conclude that rating an app on a mobile makes it a breeze because of the ease of use. A good 74 percent prefer to rate on mobiles over other devices. Though some users choose not to rate apps or related services when it’s followed by a compulsion to write a review.

Are Rating Systems going to rule?

Black Mirror tried to mirror this advent of Digital Age, the psychology of rating systems, in one of its episode - Nosedive (S03E01). In the satirical episode, users could rate all their online and in-person interactions on a five star rating system for each other. Everything, from status in the society, to access to certain services, to employability, was a factor of a person’s current rating.

We have already transformed ourselves to a generation of critics, and entitled managers (who hardly get paid). We observe every move of the server in a restaurant, gauge the quality of the clinking sound the spoon makes when he places it on our table, wait to notice the level of politeness when he agrees to the choice of our dish, keenly taking notes and running mental algorithms to deduce a suitable rating, before we move to our next entitled project to critique.

Having said that, with digital product usage hitting a new high and the concepts of User Experience & Gamification ruling the psychology, rating systems are bound to score. I hope this gives us all some insight in what works and why, when it comes to ratings.

More by Himanshu Khanna

Topics of interest

More Related Stories