paint-brush
A Cool New Career in Data Science: Crowd Solutions Architect (CSA)by@aljona
350 reads
350 reads

A Cool New Career in Data Science: Crowd Solutions Architect (CSA)

by Alena Johnson July 12th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Many companies are actively searching for CSAs – or at least for what CSAs can do – without being consciously aware that these professionals exist or this is what they're called.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - A Cool New Career in Data Science: Crowd Solutions Architect (CSA)
Alena Johnson  HackerNoon profile picture


In today’s day and age, the IT sector goes through major changes every few years, and consequently so does the job landscape. Some positions that were essentially unheard of a decade ago have now become the hottest new kids on the block. And with the tech industry expanding at an ever-accelerated pace, one of the best examples of one such position is a Crowd Solutions Architect (CSA).

Machine Learning and Data Labeling

We can’t talk about what CSAs do without talking about data labeling and crowdsourcing. Data labeling is everywhere these days, particularly (but not exclusively) where there’s Machine Learning (ML) – from search engine optimization and virtual assistants to self-driving vehicles and medical AI. The list is growing longer by the day.


As is the case with every ML-supported product or service, there are three important aspects that make it possible: the algorithms, the hardware, and the data. We’re less concerned with the first two because of their general acceptance and availability on the market. However, data labeling is a different story altogether – here, a whole host of different methodologies are up for grabs.


What do CSAs do?

One of the most up-and-coming labeling methods today is crowdsourcing, mainly due its time- and cost-efficiency. It utilizes large-scale input of data labelers known as crowd workers who are spread all across the globe. Their efforts are then aggregated to deliver tagged data to the client. This is where CSAs come in.


Every CSA has to be a proficient coder, a capable architect, and a savvy manager of people, possessing a balanced mix of hard and soft skills. To deliver a well-labeled data set in a timely manner, CSAs need to:


  • Assess, understand, and decompose the labeling task at hand.
  • Have the vision and the know-how to create a suitable execution strategy.
  • Arm the labelers with the right instructions and examples.
  • Consistently monitor their progress and offer guidance.
  • Obtain, verify, aggregate, and submit labeled data to the client.


Ironically, many companies are actively searching for CSAs – or at least for what CSAs can do – without being consciously aware that these professionals exist or this is what they’re called.


For instance, many companies from the booming AI sector in Israel – among them Mobileye, Yotpo, Innoviz, Pagaya, and Medtronic – advertise for positions like “ technical project manager for data annotation department,” “operational data manager for annotators,” “data product manager,” “data architect,” or “data engineer.” What they mean is that they actually want a CSA. Which is why awareness about this position is important.


The Job’s Specifics

A high-caliber CSA should possess both strong analytical and algorithmic skills.


Analytical Skills


  • Analytical skills come in two forms – general and specific.

  • General skills have to do with your social skills and ability to communicate.

  • Specific skills are about knowing how to write Structured Query Language (SQL) scripts.


Algorithmic Skills


  • All algorithmic skills are about problem-solving.
  • Well-developed coding skills are one of the main prerequisites, especially for “internal CSAs” who work on turnkey solutions.
  • Logical thinking is crucial for robust pipeline development and consistently high results.
  • Soft skills are also necessary, especially for “external CSAs” who maintain relationships with international clients.

How to Become a CSA

To become a CSA, you should have sufficient training and/or experience and know what to expect when you apply.


Training


For comprehensive training that will familiarize you with the ins and outs of being a CSA/data scientist, you may want to consider taking an online course from ML experts or crowdsourcing practitioners.


If you prefer the former, you have a wide range of choices: Stanford’s Machine Learning taught by Andrew Ng himself, Data Science: Foundations using R Specialization from Johns Hopkins, or Machine Learning: Fundamentals and Algorithms from Carnegie Mellon among many others. If you prefer the latter, some options are also available: for instance, Toloka’s Practical Crowdsourcing course.


A Real-life Case

If becoming a CSA sounds appealing to you, here’s a recent case from ecommerce, the likes of which you may be facing in the times to come. It’s about large-scale price matching – a common strategy utilized by e-platforms in order to stay ahead of their competition. The end goal is optimal pricing: the prices can’t be too low, leading to revenue loss; nor can they be too high, leading to the shoppers going elsewhere.


Backstory


One large e-marketplace has a whole array of different products on display and the following business goals:


  • Market analysis to learn about their competitors’ products and prices.
  • Overlap estimation in different product categories (for example, finding the same charger under “phone accessories” and “electrical appliances”).
  • Market equilibrium discovery and dynamic price adjustment that follows.


Objectives


While automatic algorithms offer a partial solution, human-handled labeling is normally needed to improve price-matching quality and identify missing matches, especially identical items listed under different product names. The company turned to crowdsourcing, and one of our CSAs was handed this project. Having spoken to the client, they quickly set the following objectives:


  • Gauge the quality of automatic matching.
  • Remove erroneous matches, thereby improving overall quality.
  • Locate URLs of matching items on other e-platforms, thereby increasing match coverage.


Solution and Results


The offered solution consisted of two sub-tasks that the CSA had to supervise throughout:

  • Crowd workers had to use URLs to identify products on third-party websites.
  • They also had to run a side-by-side comparison and decide whether pairs of items representing the same product.


Our CSA designed a sturdy pipeline that didn’t disrupt the company’s internal processes. The pipeline contained a two-way feedback loop with crowd-based labeling situated in the middle, going from the pricing system to the CSA and in the opposite direction. There were four distinct stages within the pipeline:


  • Preparation


  • Data collection


  • Quality control


  • Labeling + verification



The company used their automated pre-labeling tool to prepare a pool of URLs. The pool was then delivered to the CSA to supervise human-handled labeling. After removing incorrect matches, the CSA distributed approved URLs among crowd workers.


In order to maintain consistent labeling quality throughout, the CSA updated the project’s honeypots every several hours. This ensured that the URLs were active and the items remained in stock. After the project’s completion, the new pipeline provided a 2.5% increase in matching accuracy compared to the previous solution. This translates to substantial savings in the context of large-scale eCommerce.


Takeaway


Now that you know what CSAs do and how to become one, it’s worth considering your prospects, not least financial. Based on last year’s figures from The US Bureau of Labor Statistics, the data science market will exceed $320 billion over the next five years (compared to around $100 billion today). This will inevitably lead to a 20-30% rise in employment of data specialists – including CSAs – by the end of this decade. This trajectory is even more impressive, considering that the median annual salary for a US-based data scientist today stands at around $130,000.