MLOps and AIOps are two similar-sounding terms that are used to refer to vastly different disciplines within the industry today. Ever since the introduction of these terms a few years ago, zeitgeist interest in them has surged, as this Google Trends chart shows.
And yet, except for a handful of practitioners that are actively working on projects in these areas, for most casual readers, or even enthusiasts looking to explore the space, the meaning of MLOps and AIOps, and their benefits, come across as ambiguous, overlapping, and undifferentiated (relative to each other).
In my experience, there are two reasons for this.
The first is the implicit reference in the words MLOps
and AIOps
to the more widely understood practice of DevOps
. Makes one wonder – are MLOps and AIOps related to DevOps? Do they derive from it? If so, how are they different from it?
The second is the obvious ambiguity regarding how ML is different from AI since they're often used interchangeably. Are they the same? Are they on a continuum? If so, where does one end and the other begin?
Let's formulate these as questions, which we must be able to answer in order to understand MLOps and AIOps.
Question #1
Are MLOps and AIOps related to DevOps? If so, how?
Question #2
Since ML and AI tend to be used interchangeably, what does their inclusion in the words MLOps and AIOps imply?
Make a note of these! We will come back to them later in this post.
It's also important to keep in mind the relative infancy of both these disciplines. The terms MLOps and AIOps were coined not more than 6-7 years ago, which means their hype / buzzwordiness factor is currently high, relative to the comprehensibility at large of their semantics, applications, and benefits. This will likely continue for a little while longer until the technology matures, and the use cases become more prevalent and widely understood.
O'Reilly's AI Adoption in the Enterprise 2021 report illustrates this point using this compelling pie chart, which shows that just a quarter of the surveyed respondents said they've mature deployments of AI technology.
Reported hindrances to mature adoption were a lack of skilled people, data quality issues, difficulties in identifying relevant business use cases, lack of company culture, and technical infrastructure issues. The report also found that there is a distinct lack of standardization among tools today for the deployment, monitoring, versioning, and tracking of models and training data.
Given these challenges, it's not surprising that non-practitioners today run into comprehensibility barriers regarding MLOps and AIOps technology, toolsets, and practices.
In this post, I'll clarify what MLOps and AIOps mean, what problems they are meant to solve, and what tools exist for teams that are looking to adopt them into their product and service building strategies.
Before we get into it, though, we must take a quick detour of the DevOps concepts to build context around what it means and what problems it solves. This will help us both better understand the rationale for MLOps and AIOps, and draw clear lines of distinctions between them, later on!
DevOps started to become mainstream around 2007 in response to a common organizational problem that affected product teams in their ability to ship software at a brisk pace. Despite following the Agile methodology, it took weeks, if not months, to release software versions and deploy them in production.
The reason for this was that teams that built the software (developers), and the ones that deployed and supported it in production (IT / operations), worked in their own siloes. They reported to different executive leaders within the organization, and worked independently of each other – sometimes even physically on different floors of a building, or in separate buildings.
DevOps was a way to get the developer and the operations teams to collaborate together through every stage of the software development life cycle (SDLC), and to share common objectives and KPIs, so that high-quality software can be shipped much more frequently (often, many times in a single day) using Agile.
At its core, DevOps is about three things:
The DevOps lifecycle has six phases, shown here using the well-known Infinity Loop.
With this context, let's dive into MLOps!
MLOps rose into prominence around 2015 with the promise of solving critical operational problems pertaining to the end-to-end delivery of machine learning pipelines, similar to the ones that DevOps had solved almost a decade earlier.
You must be wondering – what are these problems with machine learning pipelines? To make it more tangible, think about what a typical ML pipeline (source: Gartner) looks like.
Three distinct skillsets are necessary to operate this pipeline. First, there's the data pipeline itself, where data is sourced, cleansed, and transformed, and is owned by Data Engineers. Then there's the curation of training datasets, followed by model creation and verification, and is owned by Data Scientists. Lastly, the deployment, monitoring, and ongoing maintenance is owned by Operations.
So we've three teams with specialized skillsets, and they need to coordinate with one another in owning and operating the entire pipeline end-to-end. If these teams operate behind siloes, and cannot collaborate using Agile practices, it will cause shipping delays and quality issues for the overall product.
Recall that these problems are similar to the ones DevOps aims to solve, and they arise when siloed teams with specialized skillsets do not have a tight interlock among them. So in that respect, you can think of MLOps as the application of DevOps principles to machine learning pipelines. Whereas DevOps comprised a multi-disciplinary team of Developers and IT / Operations, MLOps adds Data Engineers and Data Scientists to the mix, and eliminates siloes among them.
The MLOps lifecycle has nine phases, shown here using a modified version of the DevOps Infinity Loop.
Quick disclaimer: I've squashed some of the phases below together in the interest of simplicity.
Hopefully, this clarifies what MLOps means, and how it relates to DevOps. Onto AIOps, next!
The term AIOps (Artificial Intelligence for IT Operations) was coined by Gartner in 2016, but unlike MLOps it has almost nothing to do with DevOps! Rather, it refers to the usage of AI / ML techniques and algorithms to automate common, sometimes repetitive, IT tasks.
Before we rabbit-hole further into AIOps, this is a perfect time to revisit and answer the two questions we had asked at the beginning of the post. Doing so will make the rest of this content more comprehensible!
Are MLOps and AIOps related to DevOps? If so, how?
Yes, MLOps is related to DevOps in that it brings the DevOps multi-disciplinary and Agile principles to ML pipelines. MLOps makes Data Engineering, Data Science, and Operations teams managing these pipelines more efficient.
No, AIOps is not related to DevOps, but rather refers to the usage of ML / AI techniques and algorithms to automate common, sometimes repetitive, IT tasks. AIOps makes IT teams more efficient.
Since ML / AI tend to be used interchangeably, what does their inclusion in the words MLOps and AIOps imply?
In the MLOps context, ML refers to the entire machine learning pipeline, soup to nuts, including data sourcing and cleansing, model creation and verification, and deployments and monitoring.
In the AIOps context, AI / ML refer to the techniques and algorithms (decision trees, random forests, etc.) used in anomaly detection, root cause analysis, and help desk automation.
With the nagging questions out of the way, let's resume our focus on AIOps. We should ask:
What business outcomes does AIOps target? What are some examples of common / repetitive IT tasks that could be automated using AIOps?
Gartner uses the following framework to define the applicability and benefits of AIOps. Let's drill down!
At its core, just like we concluded, AIOps is about applying machine learning to big data in order to get to the following business outcomes:
Hopefully, this clarifies the need for AIOps and how it makes IT and Operations teams more efficient with their common / repetitive tasks!
To conclude and summarize this MLOps vs AIOps discussion, here's a handy TLDR version of the main points from this post.
You may have noticed that we didn't talk about any of the MLOps or AIOps toolsets. I intentionally omitted them from the discussion in order to focus on the task of qualitatively evaluating MLOps and AIOps. However, I've included some of the commonly used tools in the table below, so you can check them out for more details if you're interested.
|
MLOps |
MLOps |
---|---|---|
Definition |
An extension of DevOps principles and practices for operationalizing end-to-end machine learning pipelines. |
An application of AI / ML techniques and algorithmsto automate common and repetitive IT tasks. |
Audience |
Multi-disciplinary Data Engineering, Data Science, and Operations teams that need to collaborate on managing machine learning pipelines. |
IT teams responsible for infrastructure monitoring, security, service availability, and help desks. |
Benefits |
Makes multi-disciplinary teams efficient, helps track and version training datasets and models, and makes deployments predictable and reproducible. |
Makes IT efficient in responding to security threats, making infrastructure sizing decisions, and improving customer experience for help desk engagements. |
Toolsets |
MLflow, Kubeflow, Amazon Sage Maker, Azure ML. |
Dynatrace, Datadog, AppDynamics, NewRelic, ServiceNow, Splunk. |
Hope you found this post useful in getting the semantics of MLOps, AIOps, their respective applications, and differentiations clarified!
Cheers!
Also Published Here