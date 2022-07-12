Project manager at Toloka Academy
In today’s day and age, the IT sector goes through major changes every few years, and consequently so does the job landscape. Some positions that were essentially unheard of a decade ago have now become the hottest new kids on the block. And with
We can’t talk about what CSAs do without talking about data labeling and crowdsourcing.
As is the case with every ML-supported product or service, there are three important aspects that make it possible: the algorithms, the hardware, and the data. We’re less concerned with the first two because of their general acceptance and availability on the market. However, data labeling is a different story altogether – here, a whole host of different methodologies are up for grabs.
One of the most up-and-coming labeling methods today is crowdsourcing, mainly due its time- and cost-efficiency. It utilizes large-scale input of data labelers known as crowd workers who are spread all across the globe. Their efforts are then aggregated to deliver tagged data to the client. This is where CSAs come in.
Every CSA has to be a proficient coder, a capable architect, and a savvy manager of people, possessing a balanced mix of hard and soft skills. To deliver a well-labeled data set in a timely manner, CSAs need to:
Ironically, many companies are actively searching for CSAs – or at least for what CSAs can do – without being consciously aware that these professionals exist or this is what they’re called.
For instance, many companies from
A high-caliber CSA should possess both strong analytical and algorithmic skills.
Analytical skills come in two forms – general and specific.
General skills have to do with your social skills and ability to communicate.
Specific skills are about knowing how to write Structured Query Language (SQL) scripts.
To become a CSA, you should have sufficient training and/or experience and know what to expect when you apply.
For comprehensive training that will familiarize you with the ins and outs of being a CSA/data scientist, you may want to consider taking an online course from ML experts or crowdsourcing practitioners.
If you prefer the former, you have a wide range of choices:
If becoming a CSA sounds appealing to you, here’s a recent case from ecommerce, the likes of which you may be facing in the times to come. It’s about large-scale price matching – a common strategy utilized by e-platforms in order to stay ahead of their competition. The end goal is optimal pricing: the prices can’t be too low, leading to revenue loss; nor can they be too high, leading to the shoppers going elsewhere.
One large e-marketplace has a whole array of different products on display and the following business goals:
While automatic algorithms offer a partial solution, human-handled labeling is normally needed to improve price-matching quality and identify missing matches, especially identical items listed under different product names. The company turned to crowdsourcing, and one of our CSAs was handed this project. Having spoken to the client, they quickly set the following objectives:
The offered solution consisted of two sub-tasks that the CSA had to supervise throughout:
Our CSA designed a sturdy pipeline that didn’t disrupt the company’s internal processes. The pipeline contained a two-way feedback loop with crowd-based labeling situated in the middle, going from the pricing system to the CSA and in the opposite direction. There were four distinct stages within the pipeline:
The company used their automated pre-labeling tool to prepare a pool of URLs. The pool was then delivered to the CSA to supervise human-handled labeling. After removing incorrect matches, the CSA distributed approved URLs among crowd workers.
In order to maintain consistent labeling quality throughout, the CSA updated the project’s honeypots every several hours. This ensured that the URLs were active and the items remained in stock. After the project’s completion, the new pipeline provided a 2.5% increase in matching accuracy compared to the previous solution. This translates to substantial savings in the context of large-scale eCommerce.
Now that you know what CSAs do and how to become one, it’s worth considering your prospects, not least financial. Based on last year’s figures from