The cyber security market is always coming up with new technologies and approaches to counter the challenges in the evolving threat landscape. The security threats and challenges have become more complex over the years as organizations have moved most of their applications to the cloud. Moreover, organizations have realized that the typical lift and shift approach to cloud migration is not the silver bullet and can be costly. Hence, they now adopt a careful approach to choosing the right workloads for the cloud and retaining others to their private servers. Though beneficial, this hybrid approach poses significant challenges:
· Security, compliance, data-locality challenges
· Hybrid/multi-cloud cost and performance optimization challenges
The IT and security teams in most enterprises have tried to upgrade their threat detection and mitigation processes for the complex hybrid and cloud-native environments. New tools, protocols, and playbooks for monitoring, security incident/event management, and threat response have offered varying levels of success. However, cloud security observability has emerged as the most viable approach to solving the above challenges.
The cloud security observability approach involves applying the observability concepts to cloud security operations to gain insights into the health of an application by gauging workloads based on their external properties such as metrics, logs, and traces. The keyword here is context; observability builds on existing monitoring technologies but also goes a step ahead, adding context to all sorts of disparate telemetry data from different systems. Again, observability is an attribute of systems; the more observable a system is, the more efficient, secure, and reliable it becomes.
While much has been written and promoted by security vendors on the topics of cloud security and observability, there’s still no definitive answer, and enterprises have to evaluate the pros and cons of different approaches.
Most traditional vendors have added a layer of AI to their log management and analysis tools and now claim to offer observability in cloud-native environments. It is important to understand here that log analysis gets significantly complicated in hybrid cloud environments due to the lack of centralization, log format variations, and moving or transient components (containers). Sometimes the logs are overwritten by the time an issue is detected, while at other times, the problem detected via logs becomes difficult to replicate.
This is where the AI part of the commercial solutions offers some advantage by simplifying tracing and correlation of log entries.
Yet, to make the most of such solutions, development and operations teams need to fully appreciate and understand the complexity of their cloud-native deployments. They might be dealing with hundreds of microservices linked in a service mesh with deployment on Kubernetes, that too on an elastic cloud. An AI-powered cloud observability platform can be useful only when the users understand how their applications are deployed and what datasets can provide the desired insights.
Cloud-Native (Kubernetes) Observability: It involves aggregation and analysis of logs and metrics from the K8s control pane, cluster nodes, application resources (CPU, RAM), and ranking of nodes, workspace, and other resources by usage. While evaluating the solutions, teams should explore the workflows to drill down to details of workloads, pods, containers, etc., to identify issues.
DevOps Enhancement: It involves leveraging operational or telemetry data early in the software development lifecycle. Many times, developers lack visibility into real-world resource usage and are only exposed to issues and events. While tools like Terraform automate infrastructure provisioning as per the code, developers usually have no means of visualizing the performance, bugs, etc. Early visibility into operational metrics can allow developers to test and fine-tune their applications and reduce noise/false alarms later in the cycle.
Multi-cloud Observability: As organizations embrace hybrid and multi-cloud environments, they need to monitor different dashboards to assess security, availability, performance, and costs. As there is usually no unified dashboard, it becomes difficult to get a holistic view of an application’s health and performance. There’s a need to collect data from different clouds for unified analytics and automation.
AI-Enabled Insights & Actions: AI and machine learning algorithms can help organizations in finding trends and patterns that otherwise remain undetected via standard security policies and threshold and event-based alerts. From analyzing huge volumes of logs and metrics to correlating events across systems, AI can offer significant advantages over traditional methods of incident management.
It is easy to get overwhelmed by the commercial tools and solutions, with a wide range of overlapping features and capabilities. The chances are that your existing tools and security solutions may offer similar features and capabilities but may seem inadequate due to the lack of integration.
Some enterprise teams also dive into the open-source ocean, exploring potential solutions with tools like Nagios, Prometheus, Elastic, FluentD, Grafana, Jaeger, and more. However, there’s an alternate approach to building such capabilities by starting small and building iterative solutions – a bottoms-up approach to improving observability, if you will.
Here's an example; we created a DevOps observability solution for one of our clients using Klera’s no-code application builder platform. Klera offers several out-of-the-box connectors for DevOps tools (Jenkins, Jira, GitHub, SonarQube, TestRail, etc.), cloud platforms (AWS, Google Cloud, Kubecost, Microsoft Azure), and databases.
With these connectors, we were able to collect data from different sources, apply computations using excel-like custom formulas and Klera’s own formula engine, and build dashboards for unified visibility. We were able to leverage some of Klera’s existing apps and templates from its App Store, which provided a significant head start in the development. Here’s a sneak peek into the solution:
The first dashboard in the solution provides a high-level view under the five major parameters, viz. availability, security score, SLA adherence, cloud spend, and KPI index. These are aggregated scores based on several metrics and information from different sub-systems. One can drill down to details of every score and find contextual details down to the code level. The KPI Index, for instance, is based on DORA metrics and two other quality metrics, i.e., average test coverage and quality violations.
It is possible to introduce more metrics or adjust their thresholds as per evolving needs. The solution also provides visibility into cloud costs and resource usage:
The solution is highly flexible and extensible. Klera’s connectors make it simple to aggregate and centrally visualize data from SIEM tools, vulnerability scanners, threat intelligence feeds, and other existing enterprise sources and develop solutions for your cloud and security operations command center. It offers a practical and simple approach to improving cloud security observability in an iterative manner without getting into lengthy POCs with all-encompassing, complex, and costly solutions.