paint-brush
Online Dating From A Data Analysis Perspective: A Deep Diveby@kylas
583 reads
583 reads

Online Dating From A Data Analysis Perspective: A Deep Dive

by KylaNovember 1st, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Online Dating From A Data Analysis Perspective: A Deep Dive into the science of the algorithm. Hinge has grown its user base 10x over the past three years, with a +60% increase in ARPU year-over-year. The number of users is expected to grow by 5M, up to 35.4M, over the same timeframe. The apps assume that ‘love’ is quantifiable, to an extent, and these algorithms take advantage of those patterns to recommend compatible partners across the network.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Online Dating From A Data Analysis Perspective: A Deep Dive
Kyla HackerNoon profile picture

Love in the time of COVID is a… challenge, to say the least.

I’ve downloaded and deleted Hinge every few months since December 2019, and decided to run the data on my most recent download to see how things were going. I did not go on any dates with anyone new due to the pandemic, but I did chat with a few.

However, I wanted to do a deep dive into the science of the algorithm – what drives this fast-growing matchmaking process?

What is Online Dating?

Online dating is algorithmic matchmaking. Most apps ask you a series of questions or require you to list preferences, the answers of which are assessed by an algorithm and used to pair you to potential partners. It’s really a gamification of connection with others. There are a host of issues that can accompany use (such as safety, objectification, superficiality, etc.) but there are also benefits.

The apps assume that ‘love’ is quantifiable, to an extent. Love has patterns, and these algorithms take advantage of those patterns to recommend compatible partners across the network. 

And it’s a BUSINESS. Revenue was almost $1B in the U.S. in 2019, and is expected to be $1.1B in 2024. The number of users is expected to grow by 5M, up to 35.4M, over the same timeframe.

Source: Statista

Match Group, the online dating conglomerate, owns Hinge, Tinder, Match.com, OkCupid, PlentyofFish, and many more. They recently separated from IAC, the details of which are outside the scope of this article and doesn’t impact the apps noticeably. 

Source: Match Group 10Q

The apps seem to be doing well. Most of them rely on a freemium model, in which the core features of the app are free, but premium features are offered on either a subscription or a one-time purchase basis. Tinder is definitely the biggest focus of Match Group, with a 123% 5Y Revenue CAGR, but the company has also invested substantially in Hinge.

Source: Match 10Q

The pandemic has driven a lot of users to the apps, as the more traditional way of meeting someone (the bars, the gyms, etc.) are closed down. People are also paying for more match opportunities, as shown by the growth in Average Revenue per User to $0.60.

Source: Match Group Letter

Hinge has grown its user base 10x over the past three years, with a +60% increase in ARPU year-over-year, showing that users are more willing to pay for matches.

Source: Match Group LetterWhat is Hinge?

Specifically, the company describes Hinge as

Hinge was launched in 2012 and has grown to be a popular app for the relationship-minded, particularly among the millennial and younger generation… Hinge is a mobile-only experience and employs a freemium model. Hinge focuses on users with a higher level of intent to enter into a relationship and its product is designed to reinforce that approach.

Source: Photofeeler

From a user perspective, Hinge is kind of like Tinder, but less aggressive. It’s the “app that is designed to be deleted” and you have to like someone back before they can message you. You answer 3 questions of your choice that others see, and upload 6 pictures of yourself, like above.

You can get matches in two ways:

They initiate the conversation by ‘liking’ either an answer to one of the questions or one of your pictures

2. Or you can initiate the conversation by ‘liking’ them in the same fashion:

You can also set ‘dealbreakers’. For example, if you are looking for a person who might follow a certain religion or doesn’t drink, you can set it as such.

The Gale-Shapley Algorithm

Hinge uses the Gale-Shapley algorithm that pairs people “who are likely to mutually like one another”. It measures this based off your engagement and who engages with you, as well as matches you to people with similar preferences. 

For a brief overview (skip this part if you don’t want the juicy algo details):

The dating market is two-sided: one person seeks out another, with the platform serving to enable interaction. It broadly relies on network effects: the larger the pool the app the pulls from, the higher probability of finding a person that meets preferences.

This gets into the ‘Stable Marriage Problem’, which searches for a stable matching between two entities, given the preferences of said entities. More specifically:

Given n men and n women, where each person has ranked all members of the opposite sex in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. When there are no such pairs of people, the set of marriages is deemed stable.

(Source: Wiki)

The Gale-Shapley algorithm solves this through a series of iterations in which element A proposes to their highest ranked element B. Element B responds yes or no — if no, Element A then goes to propose to their next most-preferred Element B until everyone is engaged. The matching is considered stable when there is no match (A, B) that prefer each other over their current partners.

The Gale Shapley Proposal

For a more in-depth example, let’s say that there are 4 bugs and 4 trees. We need to match the bug to a tree, through stable pairings. 

Stable doesn’t mean perfect — not everyone is going to be completely satisfied with their pair, but they wouldn’t prefer anyone else who is available over the pair that they currently have (Pareto-optimal).

The Trees and the Bugs Optimality:

Inspired by Ananat Sahai's notes for Discrete Mathematics and Probability Theory

So we have four bugs: a bumblebee, a ladybug, a caterpillar, and a butterfly. We also have four trees: a pinetree, a cactus, a tulip, and an oak tree. 

The Preferences of the Trees

We can build out preference orders in matrices to begin the matchmaking.

All the trees prefer the bumblebee the most (of course, this makes perfect sense, as bumblebees support 85% of all plants and pollinate 30% of our nutrition).

The Preference of the Bugs

The bumblebee prefers the pinetree, the ladybug prefers the oak tree, caterpillar likes the cactus the most, and the butterfly likes the tulip.

So the preference web look like something like this:

The matching process is as follows: all the trees go to the bug that they prefer — and the bug accepts whoever they prefer most out of who comes over.

The bumblebee selects the pine tree, as that is their most preferred treeThe other trees go to their second most preferred treeThe ladybug chooses the oak tree, as that is their most preferred treeThe cactus selects the butterfly, their second most preferred bugThe tulip selects the caterpillar, their third most preferred bug

The cactus knows that it can never get the bumblebee — the bumblebee is completely enamored with the pinetree, so it is happy with the butterfly. The butterfly would be happier with the tulip, but because the tulip is with the caterpillar (which it prefers over the butterfly) the matching is stable.

Thus, we have stable matches. Basically, everyone prefers each other. There is a chance that a bug prefers some other tree more — but they can only go to that tree if they are available (which they aren’t). It’s not perfect — but it’s optimal.

S = {(Pinetree, Bumblebee), (Cactus, Butterfly), (Tulip, Caterpillar), (Oak Tree, Ladybug)}

T = {(Pinetree, Bumblebee), (Cactus, Caterpillar), (Tulip, Butterfly), (Oak Tree, Ladybug)}.

This is the pseudocode for our tree-bug scenario:

Hitsch et al wrote a comprehensive article applying the above algorithm to the online dating world (in 2010) and their ultimate finding was:

Our results suggest that the particular site that we study leads to approximately efficient matching outcomes (within the set of stable matches), and that search frictions are mostly absent. Hence, the site appears to be efficiently designed…Based on the results obtained in this study, we conclude that online dating data provide valuable insights toward understanding the economic mechanisms underlying match formation and the formation of marriages

Source: Matching and Sorting in Online Dating

This stable matching pairing is actually quite effective. Hinge applies this through the ‘stable roommate problem’ grouping people into a common pool, without the gender division. The same effect applies — organizing people based off a set of preferences (with knowledge that your person will probably never be ‘perfect’) does actually work well.

My Date Data

So, I wanted to see what my data looked like. I knew I wasn’t going to be able to backtest any algorithm due to information asymmetry, but I wanted to see what the iterations of interaction did look like. 

The Data

The data came in a JSON file which I imported into R using JSONlite. The data was super messy (lots of nested lists), so I ended up converting into into a CSV and doing output through Excel.

Snippet of my Data

It contained:

The number of times I selected ‘No’The number of times I selected ‘Yes’The number of ‘Yes’ I received If I ‘chatted’ with someoneIf they ‘chatted’ backAbout Me

My profile is pretty dorky. Most of my pictures are me either wearing a pi shirt or doing yoga, but I do have a serious selfie in there. 

Source: My Profile

My ‘thought prompts’ are

Do you disagree or agree that breakfast foods are a conspiracyI geek out on Math, preferably statisticsI won’t shut up about Polish Electronica

Most people tend to comment about the breakfast food question, but most of the time, interactions are led by someone liking a picture. 

Notes

* The timeframe is from June 13th 2020 to August 23rd 2020

*This was in Los Angeles and distance was set to <25 miles

*I had a standard membership (not premium) 

*I date everyone (as long as they're kind)

*Most often I was responding at 8pm and I got on about 3–4 days a week for about ten minutes

This chart was compiled using SankeyMATIC. It details who ‘liked’ me, who I ‘liked’, who messaged first, and if a conversation was had. As you can see, there was a large pool initially, which was whittled down through interaction (or lack thereof).

Key Terms

Remove: Those who liked me and I didn’t like back

Match: A mutual like

Ghost: Conversation ends

Key Takeaways

*Over the 2 month time span, I matched with 10.6% of my available pool of likes

*I initiated only 22% of the time, and 25% of my attempts were successful 

*The other 78% of interactions was them ‘liking’ me, of which I responded 81% of the time.

*9.8% of my interactions ended up as ‘conversation’ which I marked as > 5 messages

Conclusion

There is a lot of analysis on the nuances of dating apps, including the weird science of attractiveness, in which 

“Being very, very attractive as a man offers no advantages over being fairly average. Women like men who rate themselves as five out of 10 as much as men who think they are 10 out of 10s, whereas men would ideally date someone who self-rates their physical appearance as eight out of 10.” Source: BBC

Source: HCMST

Search costs are still relatively high on most apps, due to information asymmetry and the potential gaps in the matching process. It does increase sample size of available partners, but can also work to depersonalize the entire exchange (primarily through the gamification). However, online dating has become the most popular way that people meet their partners, as shown above.

Source: Pew Research

People who have had a positive experience with the apps have cited the increased opportunity to meet people as the top upside, but dishonesty and misrepresentation as the biggest downside. Pictures and openness about intent seems to be the most important to users

Overall, it seems that if someone was actively pursuing a relationship (which I am not) it seems the best thing to do is to optimize the algorithm: make a fun profile, be responsive, and engage actively. But please don’t be creepy. Long term happiness is something that you yourself create- not an app.

Also, this question from HBS is worth considering:

In the long term, should Hinge be worried that it may stunt its own growth by improving its matching protocols and tools? In other words, if the implementation of machine learning increases the number of stable matches created and leads to happy couples leaving the platform, will Hinge lose the user growth that makes it so compelling to its investors? Source: HBS

If the app matches everyone perfectly, does that mean its working? Or does that mean its losing users?

The paradox of online dating.