There is a new hype in the DevOps world: Istio and service meshes. But why exactly is everyone suddenly migrating to Istio? In this post, we will examine whether it actually makes sense to use Istio and in which use cases it may help you (as well as when it’s simply an overkill).
But before discussing Istio, we should first introduce the concept of a service mesh.
A service mesh is a new paradigm to abstract network infrastructure, communication between services, and part of what used to be coded in the application’s logic until now. It allows us to manage services in an easier and better way and ultimately saves us time and money.
The idea is that, in the new world of microservices, apps should not care about underlying network infrastructure as they already don’t bother with runtime environment thanks to the popularity of Docker and containers. It may sound very terrifying at first, but the idea is to abstract the network layer.
What you actually need to know is that a service mesh is basically a way to proxy traffic in your cluster both from external sources (load balancers) and internal traffic between your services (usually sidecar containers injected to each pod with your service). This setup allows you to control all network traffic in a completely new way, both into the cluster and in between your apps inside of a cluster. Our series of articles on Cloud Native Computing Foundation (CNCF) tools have already covered Jaegar, the popular tracing tool.
It’s always worth asking “why” something is being considered for implementation. This short section will provide the pros and cons of a service mesh.
Underlying networking has been a bit of a mystery for a lot of developers. It was always simply the fault of a network if no one knew what else could be wrong. All those packets, firewalls, CIDRs, and other concepts, especially in the new cloud world, made it somehow too complex for a clear understanding. One of the promises of a service mesh is to bring networking closer to engineering teams in declarative, understandable, and versioned statements. Of course, team members that are more experienced in ops/networking type of work will still have to manage the network infrastructure that is underlying the cluster. But at least you can have a common layer for the majority of network-related tasks performed in one, standardized, and easy-to-follow way.
The architecture of having manageable edge proxy (external traffic) and sidecar containers in every single app in your cluster allows you to have greater control over a significant portion of an apps’ logic and maybe even some business logic written at the network layer. Everything is written in a single coherent way, allowing for control over the whole network layer. So, instead of implementing various connectivity features in each of your apps, you can have them bundled together in a single place: the service mesh.
You know that changing applications in the world of microservices today can be quite a daunting task. So why not declare the majority of connectivity logic and default problem-handling at a network layer? This way, if a crucial app is down, other apps relying on it don’t need to resolve dependencies because the service mesh can do this for them. All the retries, health checks, reroutings, and retransmissions can finally be abstracted away. You can finally declare all the apps in a single format and a single language. On top of that, you can now also have an easier way to create rules around what kind of external traffic gets into or out of the cluster and which services handle it.
All of the above can be achieved via a well-designed low-level service mesh. And Istio is actually just a wrapper to write rules and configuration for an Envoy service mesh in a much nicer and cleaner way.
Let’s take a look at the more practical features of Istio that leverage the service mesh architecture.
If you want to connect microservices and require finer control over the flow of traffic they receive, Istio is a powerful choice. Circuit breakers ensure that if an app is down, other apps trying to connect to it won’t cause an accidental DDoS while it’s trying to get back up. You can effectively block all traffic coming to an app for a specific amount of time in order to give it the time required to reboot safely. It may sound like something that can be easily encoded into each app, but then every app developer has to spend time on this. And if one app does not implement it, it will become the weakest link in a whole cluster, causing a cascade of failures across an entire service.
With advanced network control, we can also enjoy blue-green deployments or canary releases in an elegant fashion. Another option Istio provides is automated security. The previously mentioned sidecar containers injected into each pod and proxying traffic take care of managing encryption, authentication, and authorization via the use of mutual TLS.
Last, but definitely not least, Istio provides observability into your distributed applications by allowing for logging, tracing, and monitoring of your entire cluster network. Some people even predict that in the future such granular network tracing will enable you to bill customers per specific microservices usage in your cluster. So far, billing per actual service usage is a privilege for only the biggest cloud providers.
Even though its authors claim that Istio should be compatible with a range of technologies, most resources are focused on Kubernetes at the moment. If you’re fine with Kubernetes but feel your microservices could use some more security, then Istio is a valid option. It’s also one of the few tools that allow for network access policies management on such a high level. That’s why it’s great for creating a robust, secure, auditable, and compliant infrastructure with Kubernetes.
Kiali, a separate tool made by different authors, visualizes Istio’s service mesh in a Web UI, allowing you to interactively browse the connections between your microservices.
The greater number of features with Istio, unfortunately, means that not all of them are stable and mature at the moment. If you want to have a lot of ways to manage your cluster network and its policies, then Istio may be a good choice for you. The same is true if you work in an industry that requires higher compliance, such as in the financial sector. In such industries, Istio is becoming a standard.
The main competition for Istio at the moment is Linkerd. Linkerd has a slightly different philosophy and definitely has fewer features, as there’s a smaller team working on it. They don’t try to cover as much space as Istio, but they do focus on the developer experience and being ready to go live from day one. The features are usually very stable and robust, and the learning curve is definitely smaller than with Istio.
Despite all the benefits of a service mesh and Istio, it’s not always a good idea to implement it. In fact, it may definitely be a bad idea if you already struggle with managing an existing Kubernetes cluster. Istio adds another abstraction layer, which may add extra complexity. If you don’t have a team in place that can manage it properly, implementing a service mesh may actually slow you down. It also may simply be overkill if you don’t really need advanced network control. If you only have a few apps running in your cluster, then Linkerd might be a better choice.