If you're working on something that users actually use, then you're most likely also acquiring data en masse. When it comes to free text feedback, this data might get lost or stay in the hands of some analysts. How to take a few easy steps, to turn that data into actionable steps instead.
I'm running a platform called MentorCruise. We run a marketplace of over 150 mentors that people in Tech can browse and filter, and then apply to get matched up with their mentor of choice.
Over the past year, close to 2,000 people applied to be in a mentorship, but while browsing our database, I found that only 48% of people got accepted. I wanted to find out why mentors decided to reject mentees, and how I could take steps to make the process easier.
Acquiring the data
When an application gets rejected, mentors can add a tiny piece of feedback to their decision. I always recommend to product builders to collect feedback wherever they can – understanding why your users do what they do is invaluable, and it usually only takes a second to provide that feedback. With that in mind, I was able to really easily filter all examples with feedback attached, using the Django ORM.
MenteeApplication.objects.filter(rejection_feedback__isnull=False)
Done! The result? I had 200 applications in hand with some feedback attached. I iterated through those results, and wrote all pieces of feedback in a text file. I've got my data.
Data cleaning
Things looked messy at first. This is really a case-to-case thing, but to get your data in a clean form, I'd recommend doing all of these:
Let's assume we're all cleaned up now, let's continue.
Making a Word Cloud
I wanted to visualize what I can do to help mentors, and thought that using a Word Cloud would be a good idea to do so. It would give me some keywords to look into, and would be a quick way of filtering all those sentences and pieces of feedback I've acquired. To start, I downloaded
, which would allow me to do exactly that.wordcloud_cli
I was naive at first, feed my whole document to the tool, and this is what it spit out.
Maybe some signal in that, but it's highly diluted by some words that I'd expect to be included in every sentence, and some others that I'd expect in every single piece of feedback. Let's filter those out.
For the first part, I'm looking for a stop word list. Stop words are very common words in a language, that tend to get filtered out for most natural language or otherwise computed tasks. They simply don't have much meaning when they are out of context. To do so, I downloaded this stop word list from NLTK: https://gist.github.com/sebleier/554280.
For the second part, I simply added some of the words that don't provide much value for me. Looking at that initial word cloud, things like "Mentor" or "Good luck". Let's try again.
Now, not perfect, but at least I can identify some sentences which ring my alarm bells. What do we do with that now?
Cross-Reference
Right off the bat, I can identify some key issues that mentors face when deciding if they want to take on a new mentor:
There are also some words and sentences which are more ambiguous. How are they used in context? In these cases, I need to go back and look for feedback sentences using these words.
Usually, you can identify some things right away, others need context. It's a good way to filter vast amounts of unstructured data like this and boil them down to a smaller set.
Conclusion
So, what do we get out of this? With a few tricks, I was able to turn 200 different, unstructured sentences to a set of maybe a dozen keywords which are interesting to me. There's no bias, no way I could ignore the big and bold words that appear on my screen.
Next up, I can apply the same principle to all the other unstructured data I have access to: Why did people apply? Where did they find us? Why do people cancel their mentorships? In the same way, you can use this when building something of your own. It's difficult to get honest feedback, and this is the most foolproof way to get there.