The MLOps Conference took place earlier this week at Hudson Mercantile in New York City. Experts from the New York Times, Twitter, Netflix and Iguazio, the host company, spoke about best practices and machine learning implementation throughout a variety of different organizations.
I learned of the technological void that exists when data scientists want to implement machine learning. With this new context in mind, I can approach conversations with our data team from a new perspective, and take the time to understand how we can implement new models on our team.
Machine learning as a technology has been around for more than 50 years, beginning with Arthur Samuel’s pioneering work at IBM where his program helped the computer improve with each game of checkers it played in 1952. But despite this progress, the ability to deploy new models remains a challenge. In fact, the pipeline to deploy new models can take weeks or months with many models never making it to production.
This is the importance of applying DevOps methods to machine learning (MLOps). As explained by Julie Pitt and Ashish Rastogi of Netflix, data scientists are most involved with the solution of applicable technologies. The challenge is that companies need to have the entire infrastructure underlying those technologies in order to implement machine learning.
Thankfully, new solutions and best practices are coming onto the market to address these problems. Brittany Wills explained that Twitter created a Feature Store (essentially a library) where data scientists can publish their latest work.
By sharing best practices and verified models across the organization, various teams no longer have to constantly build from scratch, enabling them to activate models in days rather than months.
Additionally, David Aronchick from Microsoft argued for the importance of regularly training your model. The data that the model was originally trained with goes stale very quickly in production and if you’re not careful, this could lead to data drift.
While a Feature Store may require organizational buy-in, focusing on the micro elements of one model can have monumental impacts on the way consumers experience your products.
One of the most interesting technologies shared during the conference was Iguazio’s Nuclio. Iguazio is creating the data science platform for production, enabling companies ranging from startups to large enterprises to introduce machine learning and AI into their products in a fast and scalable manner.
Orit Nissan-Messing, their VP of R&D, sat down with me to explain their latest innovation.
Could you explain what it is that Iguazio does?
Iguazio builds architecture to streamline data science to production. We recognize that data scientists waste a majority of their time on “plumbing” rather than building actual models. We seek to solve that problem.
And how does your latest innovation, Nuclio address this challenge?
Nuclio is an open source serverless platform that seeks to automate the delivery of code and models to production. Nuclio is unique in that it addresses ML workloads and provide high-performance, it can increase data throughput or accelerate ML training while decreasing latency by scaling computation resources dynamically. Essentially, serverless technology enables companies to focus on the application without worrying about dev ops, performance, or scalability.
How does this complement the work that Iguazio is already doing?
Our data science platform is unique to the market because we are the only one that can work on any cloud or even on-prem and handle any data type. We work with AWS and Microsoft to pull in data from their cloud resources, enabling companies to build insightful models with data that would have otherwise been siloed. Adding Nuclio supercharges this data and simplify the delivery of intelligent business application.
What what you tell Product Managers who rely on machine learning or are interested in learning more about it?
The way in which you collect your data will ultimately have a large impact on the final model and application. It’s critical to ensure that your data is accessible and stored properly. When you have data from different sources you must also normalize that data to prevent false quantitative bias. Ultimately, the Product Managers are most responsible for driving the vision and bringing the business problems to the Data Scientists, but it’s helpful for them to have context around the ML infrastructure so that they can have a sense of cost to develop and reliability of the model.
Overall, it was an insightful glimpse into the infrastructure that underlies machine learning and how each component can still present an obstacle at a company who is keen to venture into ML.
For Product Managers, here are a few key takeaways:
Image credits