Crowdsourcing in Research: Top 5 Projects to Inspire You by@dustalov

Crowdsourcing in Research: Top 5 Projects to Inspire You

Crowdsourcing has become an invaluable tool in academic and industry research. The crowd force of Toloka has helped to make some of the most recent scientific discoveries. The crowdsourcing tool has been used to test the effectiveness of Machine Translation (MT) and Plot-Trained Writing (PTS) models that help writers develop and produce well-crafted stories that can be enjoyed by readers. The research was named "most impactful" by UNESCO's International Research Institute on AI (IRCAI)
Dmitry Ustalov HackerNoon profile picture

Dmitry Ustalov

Ph.D. in Natural Language Processing | Head of Research at

By offering vast quantities of accurately labeled data at scale, crowdsourcing has become an invaluable tool in academic and industry research. So, let's take a look at how harnessing the power of the crowd helped to make some of the most recent scientific discoveries.

Night Photography Rendering

Images taken at night come with their own unique set of challenges not typically encountered in daytime photography. For example, there are usually multiple light sources visible in the scene at night. The lack of light, unusual lighting environments, and artificial light sources make it difficult to use traditional image processing techniques for night image rendering and correction.

The New Trends in Image Restoration and Enhancement (NTIRE) 2022 challenge is focused on studying images captured by cameras in low-light conditions to create new methods that transform them into visually appealing photographs.

The submitted images were evaluated through a visual comparison task carried out by the crowd force of Toloka. Michael Freeman, a world-renowned photographer, further ranked the solutions with the highest mean opinion scores. The results show that ratings received as a result of the poll in Toloka provided a ranking that was nearly as accurate as that of a professional, which makes this tool suitable for evaluation.

Machine Translation

Machine Translation (MT) is an area of Computer Science that investigates automatic translations between different languages. One of the key events in MT research is the annual Conference on Machine Translation (WMT), which is backed by Meta and other big names in tech. Given the difficulty of this problem, researchers organize evaluation campaigns, called shared tasks, to measure the effectiveness of new approaches and to analyze their strengths and weaknesses.

At the WMT ‘21 shared task on translation, Toloka allowed gathering ground truth datasets for all the relevant language pairs using the annotators available across the globe. Previously, WMT relied on Appraise, an open-source data-labeling application. Toloka was able to integrate into the established labeling process with Appraise and enable annotating data at scale with a high quality, which was instrumental for the success of the evaluation campaign.

AI for Good: Framework to Empower Digital Workers

Saiph Savage, an Assistant Professor at Northeastern University and Director of the Northeastern Civic AI Lab collaborated with Toloka to lead a research initiative called "A.I. For Good Framework to Empower Digital Workers" to help rural workers get better wages and conditions.

To have a better understanding of the reasons digital workers are so often undervalued, the respondents first completed detailed surveys on Toloka. This includes their skill development, hourly wages, creativity at work as it relates to wellness, possible invisible labor, and unfair treatment. The preliminary stage was followed by training sessions to develop digital skills. As a result, many adults in Mexico and underserved regions in the US managed to launch their careers online.

The research was named "most impactful" by UNESCO's International Research Institute on AI (IRCAI). The initiative is one of the UN's 17 sustainable development goals (SDGs) that aim to address some of the world's most pressing issues.

Plot Writing From Pre-Trained Language Models

Automatic story generation technology aims to help writers develop new ideas and produce well-crafted stories that can be enjoyed by readers. However, as most current language models rely heavily on common sense knowledge, generating long-form stories is still a challenge. Scratchplot, a new automatic story writing tool, uses a completely different approach. It first performs in-depth content planning and then, based on that, generates the story's body and ending.

Because of the increased complexity of the tasks performed by machines, automatic and human evaluation is required to rate texts on fluency and understandability, natural language, grammatical errors, cohesiveness, logic, and relevancy.

For human evaluation, the researchers turned to Toloka's international network of crowd annotators. Tolokers selected for the project had to be fluent in English within the 20% of top-rated annotators and had to pass a short training session. To make sure that the process would provide high-quality results, quality control rules were applied, such as limiting each crowd annotator to no more than 50 tasks and banning those who repeatedly submitted tasks too fast.

By combining automatic and human benchmarking, the researchers were able to confirm the tool's high reliability and validity.

Detecting Deforestation in Tropical Forests

Tropical forests play an important role in the global ecosystem by absorbing billions of tons of carbon, promoting cloud formation and rain, and being home to indigenous people. However, millions of hectares of tropical forests are lost each year through deforestation and degradation.

Several governmental and private initiatives have been created to detect deforestation in tropical forests through analyses of remote sensing images. However, due to the large amount of data, this approach is very laborious and time-consuming.

The ForestEyes Project is a unique initiative that combines machine learning and crowdsourcing for the detection of problem areas, generating deforestation maps and alerts, thus ensuring the conservation of at-risk areas.

The volunteers who took part in the project analyzed and classified more than 5,000 tasks from remote sensing images of the Brazilian Legal Amazon, and the results were compared to ground truth from the Amazon Deforestation Monitoring Project PRODES. The volunteers built high-confidence labeled collections of remote sensing data that will be used as the training set for classification algorithms.

Bottom Line

Crowdsourcing proves to be an important and useful approach to research that expands the scale at which science can be conducted while also helping to cross-reference the results of human and automatic evaluations. On top of that, crowdsourcing tools allow scientists to tap into the collective wisdom of the crowd more quickly and affordably than they could with more conventional approaches, creating more job opportunities and ensuring ethical compensation.

react to story with heart
react to story with light
react to story with boat
react to story with money
. . . comments & more!