Our current design sprints look to improve how we add language to our text classification system. One of the final layouts we tested. An early HTML wireframe that was DOA. Some Background Translations, l33t speak and other nuances make it impossible for the machine to do it all. Human verification ensures a high level of accuracy, although it’s often a slow and arduous process. A small (but significant) starting point is looking at groupings of words for inaccuracy. For example, we can take a set of translations from a new language and compare them to an established language. We let the machine make some assumptions, i.e. most of the words we can assume are “correct” and have humans vet those assumptions. What’s an appropriate UI for verifying a list of words? Where is the balance between speed and accuracy? Ideation Using a set of words for colours in Spanish, translated through Google Translate, we prototyped our ideas. After some talking and sketching, we had some good insights and the ability to make some early decisions. A/B Testing Layouts After some initial HTML wireframes, we landed on two different layouts. We wanted to see . Speed being how many words a person can “verify” per second. Accuracy being how many words they verified correctly. which one was optimal for speed and accuracy Setting the complications of language aside, here’s what we learned about our first interface ideas. Although, the patterns of commonly missed words prompted some great new ideas. But first… please take a moment — can get 90% or higher in under 30 seconds? Test Users: internal staff and posting links on Designer News, Dribbble and Hacker News. The final test variants and current results. Check out the live scoreboard. FREE CODE: . Dig through the repo What should we do next? Or different? Any feedback or suggestions that would is much appreciated. help us