Machine learning has emerged as a must-have tool for any serious data team: augmenting processes, generating smarter and more accurate predictions, and generally improving our ability to make use of data. However, discussing applications of machine learning, in theory, is much different than actually applying machine at scale in production. In this article, we walk through common challenges and corresponding solutions to making machine learning a force multiplier for your data organization. learning models 4 Reasons why Machine Learning Projects Fail From generating your weekend bike route on Google Maps to helping you discover your next binge-worthy show on Netflix, machine learning (ML) has evolved well beyond a theoretical buzzword into a powerful technology that most of us use every day. For the modern business, the appetite for machine learning has never been stronger. But while certain industries have been transformed by the automation made possible by ML—think in finance and in e-commerce—the hard truth is that many machine learning projects fail before they ever see the light of day. fraud detection personalized product recommendations In October 2020, that only 53% of projects make it from prototype to production—and that’s at organizations with some level of AI experience. For companies still working to develop a data-driven culture, that number is likely far higher, with soaring to nearly 90%. Gartner reported some failure-rate estimates Data-first tech companies like Google, Facebook, and Amazon are transforming our daily lives with machine learning, while many other well-funded and highly talented teams are still struggling to get their initiatives off the ground. But why does this happen? And how can we fix it? We share the four biggest challenges modern data teams face when adopting machine learning at scale— and how to overcome them. Misalignment between actual business needs and machine learning objectives The first challenge is strategic, not technical: starting with a solution instead of a clearly defined problem. As companies race to incorporate machine learning into their organizations, leaders may hire data scientists and to automate or improve processes without a mature understanding of to solve. machine learning practitioners which problems are actually suitable for ML And even when the business problem is a good fit for machine learning, without a shared definition of what success looks like, projects can languish in experimentation mode for months while stakeholders wait for an idealized level of machine-like perfection that can never be reached. Machine learning is not magic, it will not solve every problem perfectly, and should, by nature, continue to evolve over time. Sometimes, a merely achieving the same results as humans is a worthy project—errors and all. model Before starting any project, ask your team or your stakeholders: What business problem are we trying to solve? Why do we believe that machine learning is the right path? What is the measurable threshold of business value this project is trying to reach? What does “good enough” look like? Without these clear, shared definitions articulated at the outset, many worthy machine learning projects will never reach production and valuable resources will be wasted. Solve a business problem using machine learning and not just embark on a machine learning project for checking off the ML box. Machine Learning Model Training that doesn’t Generalize With a clearly defined business problem and targeted success metrics, your potential pitfalls get more technical. During the model training stage, issues related to your training data or model fit are the likeliest culprit for future failure. The goal of is to develop a model that can generalize, or make accurate predictions when given new data because it understands the relationships between data points and can identify trends. model training Your training dataset should be clean, sizable, and representative of the real-time data your model is expected to process once in production. Nowhere has one seen clean data in a production environment. Expect to spend considerable time cleaning, labeling and feature engineering just to get the data ready. Representative training data is also key: If your training data doesn’t reflect the actual datasets your model will encounter, you may end up with a model that won’t perform once you’ve reached testing or production. Another issue that can occur during training is overfitting and underfitting. Overfitting happens when a model learns and produces outputs that fit too closely with your training data. too much Underfitting is simply the opposite—your model doesn’t learn enough to make useful predictions on even the training data itself, let alone new data it will encounter in testing or production. Machine learning testing and validation issues As you test and validate your models, new challenges can arise from merging multiple data streams and making updates to improve performance. Changes to data sources, model parameters, and all introduce room for error. feature engineering This may also be the stage when you detect overfitting or underfitting in your model—a model that performed wonderfully during training but fails to produce useful results during testing may be overfit. Even at companies like Google, where machine learning engineers abound, in your product models can—and will—arise. surprises Machine learning deployment and serving hurdles Deploying machine learning projects is rarely simple, and teams typically can’t use consistent workflows to do so—since machine learning projects solve a wide range of business problems, there’s a similarly wide range of ways to host and deploy them. For example, some projects require batched predictions on a regular basis, while others need to generate and deliver predictions on-demand when an application makes an API request to predict using the model. (This is part of why it’s challenging to make models apply to different use cases, no matter how appealing it may sound to executives who may view machine learning models as more magic than narrowly focused functions.) Additionally, some machine learning projects can require a lot of resources, and cross-functional teams need to agree upon priorities of deployment. Engineers only have so many things they can productionalize, and as we’ve discussed, machine learning projects are much more than models and algorithms: most will need infrastructure, alerting, maintenance, and more to be successfully deployed. This is why it’s so important to articulate the business problem clearly at the outset, agree upon what success looks like, design an end-to-end solution, and have a shared understanding of your machine learning project’s value compared to other priorities. Without this strategic plan, your project may never receive the engineering resources it needs to finally reach production. For just one example, Netflix never productionalized its due to the winning model’s complexity to implement—instead choosing another submission that was simpler to integrate. million-dollar prize-winning recommendation algorithm Tactics for scalable machine learning in production Beyond strategic planning and staffing, there are concrete steps you can take to help scale your machine learning production. Lean into the cloud If your teams are working locally instead of in the cloud, it’s time to shift. Working in the cloud is the “glue” that keeps model training and deployment workflows in lockstep. Most vendors and open-source tools are developed for the cloud, and once there, it’s much easier to automate processes. Testing, training, validation and model deployment needs to be a repeatable process, it should not go from local Python code to a production environment. Leverage a DevOps approach Just as we’ve talked about applying DevOps practices to data, like and measuring data health along , ML teams can follow in DevOps’ footsteps by implementing the , while introducing . By setting up agile build cycles and tools, ML teams can more rapidly deliver changes into the codebase and improve overall performance. setting data SLAs observability pillars model Continuous Integration (CI) and Continuous Delivery (CD) Continuous Training (CT) Similar to DevOps best practices, ML teams should use containerization to consistently run software across any type of device and make it simpler for engineers to productionalize their work. Keeping a consistent and visible build process that deploys smaller changes more frequently allows the team to work more smoothly and have more insight into what’s working well, and what’s not. Visibility also helps would-be code “gatekeepers” trust the build process and speed up the ML team’s workflow. Investing time to build a strategic team and processes will help by reducing the likelihood of a project stalling out before production, and making continuous improvements feasible—setting every project up for long-term success. MLOps Invest in observability and monitoring Finally, the primary rule of machine learning is that your outcomes will only be as good as your inputs. Healthy data is absolutely essential to ML. Without clean data and working pipelines, models won’t be able to perform to their fullest potential and may fail to make accurate predictions. And when you’re relying on ML to make important business decisions, you don’t want to find out about a broken pipeline or inaccurate data after those outputs have already been delivered. That’s why data observability, which provides a full understanding and comprehensive monitoring of data health—and can prevent bad data from reaching your ML models in the first place—is well worth the investment. Achieving machine learning production at scale Machine learning isn’t magic, but it is powerful—and often misunderstood. Still, with the right mix of strategy, processes, and technology, ML can deliver competitive advantages and fuel growth across every industry. Even if you’re not building the next fraud detection algorithm or virtual personal assistant, we hope these best practices help you on your journey to (successfully!) deploying machine learning at scale. Also Published here

Amazon

Facebook

Glue

Google

Netflix

5 Ways to Become a Leader That Data Engineers Will Love 

What Is A Data Mesh — And Is It Right For Me?

Learn more about data observability

Too Long; Didn't Read

How to Build Machine Learning Algorithms that Actually Work

How to Build Machine Learning Algorithms that Actually Work

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

5 Ways to Become a Leader That Data Engineers Will Love

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

5 Ways to Become a Leader That Data Engineers Will Love

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps