In today’s day and age, the IT sector goes through major changes every few years, and consequently so does the job landscape. Some positions that were essentially unheard of a decade ago have now become the hottest new kids on the block. And with at an ever-accelerated pace, one of the best examples of one such position is a Crowd Solutions Architect (CSA). the tech industry expanding Machine Learning and Data Labeling We can’t talk about what CSAs do without talking about data labeling and crowdsourcing. these days, particularly (but not exclusively) where there’s Machine Learning (ML) – from search engine optimization and virtual assistants to self-driving vehicles and medical AI. The list is growing longer by the day. Data labeling is everywhere As is the case with every ML-supported product or service, there are three important aspects that make it possible: the algorithms, the hardware, and the data. We’re less concerned with the first two because of their general acceptance and availability on the market. However, data labeling is a different story altogether – here, a whole host of different methodologies are up for grabs. What do CSAs do? One of the most up-and-coming labeling methods today is crowdsourcing, mainly due its time- and cost-efficiency. It utilizes large-scale input of data labelers known as crowd workers who are spread all across the globe. Their efforts are then aggregated to deliver tagged data to the client. This is where CSAs come in. Every CSA has to be a proficient coder, a capable architect, and a savvy manager of people, possessing a balanced mix of hard and soft skills. To deliver a well-labeled data set in a timely manner, CSAs need to: Assess, understand, and decompose the labeling task at hand. Have the vision and the know-how to create a suitable execution strategy. Arm the labelers with the right instructions and examples. Consistently monitor their progress and offer guidance. Obtain, verify, aggregate, and submit labeled data to the client. Ironically, many companies are actively searching for CSAs – or at least for what CSAs can do – without being consciously aware that these professionals exist or this is what they’re called. For instance, many companies from in Israel – among them Mobileye, Yotpo, Innoviz, Pagaya, and Medtronic – advertise for positions like “ technical project manager for data annotation department,” “operational data manager for annotators,” “data product manager,” “data architect,” or “data engineer.” What they mean is that they actually want a CSA. Which is why awareness about this position is important. the booming AI sector The Job’s Specifics A high-caliber CSA should possess both strong analytical and algorithmic skills. Analytical Skills Analytical skills come in two forms – general and specific. General skills have to do with your social skills and ability to communicate. Specific skills are about knowing how to write Structured Query Language (SQL) scripts. Algorithmic Skills All algorithmic skills are about problem-solving. Well-developed coding skills are one of the main prerequisites, especially for “internal CSAs” who work on turnkey solutions. Logical thinking is crucial for robust pipeline development and consistently high results. Soft skills are also necessary, especially for “external CSAs” who maintain relationships with international clients. How to Become a CSA To become a CSA, you should have sufficient training and/or experience and know what to expect when you apply. Training For comprehensive training that will familiarize you with the ins and outs of being a CSA/data scientist, you may want to consider taking an online course from ML experts or crowdsourcing practitioners. If you prefer the former, you have a wide range of choices: taught by Andrew Ng himself, from Johns Hopkins, or from Carnegie Mellon among many others. If you prefer the latter, some options are also available: for instance, course. Stanford’s Machine Learning Data Science: Foundations using R Specialization Machine Learning: Fundamentals and Algorithms Toloka’s Practical Crowdsourcing A Real-life Case If becoming a CSA sounds appealing to you, here’s a recent case from ecommerce, the likes of which you may be facing in the times to come. It’s about large-scale price matching – a common strategy utilized by e-platforms in order to stay ahead of their competition. The end goal is optimal pricing: the prices can’t be too low, leading to revenue loss; nor can they be too high, leading to the shoppers going elsewhere. Backstory One large e-marketplace has a whole array of different products on display and the following business goals: Market analysis to learn about their competitors’ products and prices. Overlap estimation in different product categories (for example, finding the same charger under “phone accessories” and “electrical appliances”). Market equilibrium discovery and dynamic price adjustment that follows. Objectives While automatic algorithms offer a partial solution, human-handled labeling is normally needed to improve price-matching quality and identify missing matches, especially identical items listed under different product names. The company turned to crowdsourcing, and one of our CSAs was handed this project. Having spoken to the client, they quickly set the following objectives: Gauge the quality of automatic matching. Remove erroneous matches, thereby improving overall quality. Locate URLs of matching items on other e-platforms, thereby increasing match coverage. Solution and Results The offered solution consisted of two sub-tasks that the CSA had to supervise throughout: Crowd workers had to use URLs to identify products on third-party websites. They also had to run a side-by-side comparison and decide whether pairs of items representing the same product. Our CSA designed a sturdy pipeline that didn’t disrupt the company’s internal processes. The pipeline contained a two-way feedback loop with crowd-based labeling situated in the middle, going from the pricing system to the CSA and in the opposite direction. There were four distinct stages within the pipeline: Preparation Data collection Quality control Labeling + verification The company used their automated pre-labeling tool to prepare a pool of URLs. The pool was then delivered to the CSA to supervise human-handled labeling. After removing incorrect matches, the CSA distributed approved URLs among crowd workers. In order to maintain consistent labeling quality throughout, the CSA updated the project’s honeypots every several hours. This ensured that the URLs were active and the items remained in stock. After the project’s completion, the new pipeline provided a 2.5% increase in matching accuracy compared to the previous solution. This translates to substantial savings in the context of large-scale eCommerce. Takeaway Now that you know what CSAs do and how to become one, it’s worth considering your prospects, not least financial. Based on last year’s figures from , the data science market will exceed $320 billion over the next five years (compared to around $100 billion today). This will inevitably lead to a of data specialists – including CSAs – by the end of this decade. This trajectory is even more impressive, considering that the median annual salary for a US-based data scientist today . The US Bureau of Labor Statistics 20-30% rise in employment stands at around $130,000

A Cool New Career in Data Science: Crowd Solutions Architect (CSA)

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

100 Days of AI Day 7: Building Your Own ChatGPT with Langchain

100 Days of AI Day 4: Maximizing Productivity & Creativity with ChatGPT

100 Days of AI Day 6: Retrieval Techniques and Their Use Cases

10 Tips to Get the Most out of ChatGPT

100 Days of AI Day 2: Enhancing Prompt Engineering for ChatGPT

10 Best AI Content Generation Tools for All Your Content Needs in 2022

100 Days of AI Day 7: Building Your Own ChatGPT with Langchain

100 Days of AI Day 4: Maximizing Productivity & Creativity with ChatGPT

100 Days of AI Day 6: Retrieval Techniques and Their Use Cases

10 Tips to Get the Most out of ChatGPT

100 Days of AI Day 2: Enhancing Prompt Engineering for ChatGPT

10 Best AI Content Generation Tools for All Your Content Needs in 2022

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps