Rethinking Programming: Automated Observability
Software architect and evangelist; Director of Developer Relations @ WSO2 Inc.
Observability is the ability to understand the internal state of your system by looking at what is happening externally. In a software system, in order to acquire observability, we mainly implement the following aspects: logging, metrics, and tracing. Especially when we are moving away from monolithic software systems to microservices-based architectures, observability becomes a key aspect of the system design.
Compared to monoliths, it’s much harder to troubleshoot issues and do performance tuning in microservices deployments. This is mainly due to the added complexity of working with a distributed system. Thus, your software system should understand these challenges, and be ready to handle any issues that may arise. Observability tools allow us to do this.
In this article, we will focus on three observability tools:
- Metrics: Provides us with aggregatable information. These are numerical values computed over a period of time such as request error rates and latency values.
- Logs: Records individual system events such as resource access logs and audit logs.
- Tracing: Allows us to understand request flows in the system. This is done by associating individual events that occur in a system to groups for identifying unique flows. In a distributed environment such as microservice architecture, this becomes a distributed tracing exercise.
In order to implement these features, developers need to explicitly write code in order to emit this information externally. But what if we can automate most of the often-used observability scenarios? One approach would be for the programming libraries themselves to emit this information without much users’ intervention.
The Ballerina programming language takes this a step further by incorporating automatic observability features into the language itself.
This is possible with facets such as network awareness of the language, where it has explicit knowledge of network actions and thus can do automated instrumentation.
Let’s create a sample microservices deployment scenario in Ballerina, and show how automatic observability features in metrics and distributed tracing are implemented.
Microservices Use Case: E-Commerce Backend
We will be implementing an e-commerce backend that simulates the services required to implement searching for goods, adding them to a shopping cart, doing payments, and shipping. The solution doesn’t implement all the features of a real-world implementation, but rather follows the simple patterns and functionality you would be faced with. After all the services are implemented, we will look at how the Ballerina language and platform provides the automated observability features for the user without the need for any additional coding.
The services are exposed through HTTP. We will also be using MySQL as the persistence mechanism. Figure 1 shows the overall architecture of how the services interact with each other.
Figure 1 - E-Commerce Backend Services Architecture
For each main domain, we have a microservice to handle the work such as searching the current inventory for goods, storing items in the shopping cart, and creating orders with the current items in the shopping cart. A single “Admin” service fronts these services in order to create a unified interface for the website to interact with the backend. So the “Admin” service will be the main orchestrator in connecting the backend services.
Some of the services will have their own databases for persistence. In our implementation, we have not always persisted data in databases where we should have, but rather created in-memory data stores for this task. It’s generally a good idea to not share databases between services in order to keep the functionality loosely coupled.
So whenever some information is required, we would always go through the relevant service in order to access them. This can be seen in scenarios as shipping and billing, where the admin passes a reference to the orders, and the respective services would access the order management service to look up the order details to carry out their tasks.
Let’s go through the service implementations in order to see how they are implemented using Ballerina and how the interactions with each other are done.
This service represents the shopping cart features. The basic functionality we have implemented is to add items to the cart, get all the items, and clear the cart.
These are mapped to HTTP resources, with the respective HTTP verbs in order to resemble the operations we are doing. The persistence is done using an RDBMS database and we use the Ballerina JDBC connector in order to execute the SQL statements.
Listing 1 - cart.bal
The Ballerina services share some common data types that will be used in multiple services. These are defined in a common module that is imported as required. The code for the general structure is shown in Listing 2.
Listing 2 - types.bal
The inventory service simply implements a search operation for the users to search for goods with the given keywords. In our sample scenario, the database will be already populated with entries.
The search resource uses data binding to take in the parameter “query”, which is used to do a search operation in the database using an SQL query. The result set from the query is converted to a JSON value and returned to the caller. Listing 3 shows the code that is used to implement the service.
Listing 3 - inventory.bal
The order management service takes in the order details, which are mainly the items and their quantities included in the order, and persists it to be queried later for information lookup. Here, for the sake of simplicity, we use an in-memory data storage mechanism to store the information. The service code for this can be found in Listing 4.
Listing 4 - ordermgt.bal
The billing service simply responds to the calls from the admin by looking up the order details using the order management service, creating a receipt for the payment by the user and returning it back.
The actual operations are mocked here, where the implementation does the interactions with the required services. The code for this can be found in Listing 5.
Listing 5 - billing.bal
The shipping handling service, similar to the billing service, takes in the directive from the admin service, contacts the order management service to retrieve the details, and does the shipping operation. The response from the service is a generated delivery tracking number to be presented to the user. The code for this service is shown in Listing 6.
Listing 6 - shipping.bal
The admin service acts as the gateway for the website frontend to the backend services functionality. It also does the main orchestration of the operations for the services in the system. In our sample implementation, it contains a few straightforward proxy operations for the shopping cart and the inventory services.
Also, it implements the checkout operation which brings together the use of multiple backend services in order to complete the request. Listing 7 contains the code of the admin service.
Listing 7 - admin.bal
The diagram in Figure 2 shows the auto-generated sequence diagram view from the Ballerina VS Code
Figure 2 - Admin Service Sequence Diagram View
This view is possible in Ballerina due to the language design, where its constructs are created in order to be compatible with sequence diagramming concepts. To learn more about this, read how Ballerina uses sequence diagrams for programming
System Deployment and Observability
Now that we have looked at how the services are implemented in the code, let’s see how we would build the project and deploy it. We will also see how to set up the tools required for the observability features in Ballerina and how to enable the automatic observability features in applications.
After the repo is cloned and you have access to the “ecommerce” directory, you will be able to follow the instructions in the README.md in order to set up the MySQL database, build and run the Ballerina services, run the observability tools, and execute a traffic simulator for our services.
First, you will need to edit the cart and the inventory service source code to update the MySQL service username and password details. In a more practical scenario, we would externalize these properties using the Ballerina config API. Afterward, we will create the database and populate some sample data using the following command.
$ mysql -u user -p < db.sql
Afterward, we will build all the modules in our Ballerina project using the following command.
This would compile the source code and build the following executable artifacts.
Now, we can run the above executable files. Generally in Ballerina, we run an application using the following command.
$ ballerina run target/bin/cart.jar
But in our instance, in order to enable the observability features, we have to pass in a runtime flag. So, the above command is changed to the following.
$ ballerina run target/bin/cart.jar --
We simply need to set the property “--b7a.observability.enabled” to true and the Ballerina application will emit the tracing information and generate the metrics data. In our case, we also provide the port we would open to present the metrics data to a Prometheus
The default port is 9797, but we will be explicitly setting this with other port values, since we will be running multiple services in the same host. In the above manner, we would run all our services.
$ ballerina run target/bin/ordermgt.jar --
$ ballerina run target/bin/billing.jar --
$ ballerina run target/bin/shipping.jar --
$ ballerina run target/bin/inventory.jar --
$ ballerina run target/bin/admin.jar --
Now all our services are up and running, and they are ready to emit and generate observability data.
In Ballerina, the distributed tracing functionality is implemented using the OpenTracing API, which by default ships the Jaeger
provider for publishing tracing information.
So here, we will be using Jaeger in order to receive the tracing data from the Ballerina services. For simplicity in setting up the required software, we will be using Docker
in order to run a Jaeger server.
$ docker run -p 5775:5775/udp -p6831:6831/udp -p6832:6832/udp -p5778:5778 -p16686:16686 -p14268:14268 jaegertracing/all-in-one:latest
For metrics, Ballerina uses Prometheus to process and store the data. First, we need to configure a “prometheus.yml” to provide our service endpoints to pull the metrics data. In the project directory, open up the provided prometheus.yml file and update the host/IP values in order to represent a non-loopback address, preferably the external IP address of a network interface, which can be accessed from the Docker environment.
Now we can run a Prometheus server using Docker in the following manner.
$ docker run -p 9090:9090 -v
In order to visualize the metrics from Prometheus, we use Grafana
to deploy a dashboard predefined for Ballerina applications.
$ docker run -p 3000:3000 grafana/grafana
We now have all the services deployed and all the observability tools running. Now let’s invoke our services to simulate some user sessions. A Ballerina application is available in the “simulator” module in order to do this. It can be invoked in the following manner.
$ ballerina run target/bin/simulator.jar 100 1000
This application simulates a user session by searching for goods, adding a few items to the shopping cart, and finally checking out the items. The simulator application execution takes in two runtime parameters, where the first parameter represents the delay between session executions in milliseconds and the second parameter represents the total number of transactions to execute. The above execution will generate log entries similar to the following.
With the sample requests executed, let’s take a look at what the automated observability features are providing us.
Let’s navigate to the Jaeger web console at http://localhost:16686/
. Here, we are provided with numerous options for searching traces. If we expand the “Service” dropdown, we can already see all the Ballerina services we have deployed.
Let’s select the “Admin” service and do a search on it with the default options.
Figure 3 - Jaeger Trace Search
A trace is the entity that contains the information on a single request. This is made up of multiple spans. A span contains the information on a single unit of work done in a distributed system. Let’s take a look at a “checkout” trace, which represents a checkout request we have executed.
Figure 4 - Checkout execution trace
Here, the horizontal bars represent the spans of the operations and the spans can have other related spans associated with it. In this instance, there is one “checkout” span which runs through the whole duration of the request, and under it, there are multiple child spans that represent the other individual operations done within a checkout operation. Using this view, we can quickly get an idea of the time duration of each operation and the relationships between operations.
Each span contains metadata in the form of tags and logs. Tags contain information that applies to the full span. Log events are certain events happening in a certain time inside a span. Let’s look at a scenario of an error happening during a checkout and how this is shown in the trace.
Figure 5 - Checkout error execution trace inspection
In this trace, we see that we have incurred an error at the “addItem” operation in the “ShoppingCart” service. The respective span is marked as an error, and we can drill down into the tags and logs to find more information on the context.
Here, we see the SQL query that has been executed at this time, and also, the log entry shows that we have tried to insert a duplicate entry, thus the database operation has failed.
Figure 6 shows a similar scenario of the logs in a trace becoming useful, where we can track down the shipping tracking number by looking at the span logs.
Figure 6 - Shipping operation span logs
Using the above features in tracing, we can easily track down request information. It especially helps in finding performance bottlenecks and optimizing the flows in the services.
Let’s take a look at how we can visualize the metrics information of our deployment. In order to do this, we have to first deploy the Ballerina metrics dashboard in Grafana. Let’s navigate to http://localhost:3000/
and login with the default username/password “admin”/”admin”.
In the first step, we need to add a data source to point to the Prometheus server we have deployed earlier.
Here, select “Prometheus” as the database, and for the HTTP URL, provide “http://<external-ip>:9090”, where external-ip is the earlier IP address mentioned in the
Keep the other fields as it is and click “Save & Test”. Here, if everything is working, you will get a success message and now we can move onto importing the dashboard.
In Grafana, click on the “+” sign and select “Import”. Here, provide the dashboard ID “5841”. After it’s imported, the next page will ask you to select the Prometheus data source, where we can select the earlier created data source from the drop-down.
Press “Import”, which will take us to the Ballerina dashboard. Here, the following three sections are visualized in the dashboard.
HTTP Service Metrics
This dashboard row contains panels to provide metrics information on the services that are deployed in the system. This can be seen in Figure 7.
Figure 7 - HTTP Service Metrics Dashboard
This contains information such as request/error counts, throughput information, response time percentiles, and the top services in relation to traffic.
HTTP Client Metrics
This contains the information on the HTTP clients used in our services. These are, for example, the communication between the admin service and other services, and the communication between billing and shipping services with the order management service. This dashboard section is shown in Figure 8.
Figure 8 - HTTP Client Metrics Dashboard
Similar to the service metrics, this contains the corresponding HTTP client information.
SQL Client Metrics
This dashboard section visualizes the metrics related to SQL database operations. In our deployment, we have the shopping cart and the inventory services executing SQL statements for persistence operations. The stats on these operations will be shown here as illustrated in Figure 9.
Figure 9 - SQL Client Metrics Dashboard
Here, we can see the SQL execution counts that are happening at the moment, the error rates, response time percentiles, and the top SQL statements that are executed.
The Ballerina metrics dashboard shows the most often used information visualized, but if required, we can of course update the existing dashboard to include more information in the dashboard as well. Also, if we need to enrich the observability data with more information, we can use the Ballerina observe API
to do so.
In this article, we saw how Ballerina’s automatic observability features have allowed our application code to be kept clean while mainly concentrating on our business logic. Generally, when we want to do meaningful observability operations, we may end up littering our codebase with many library calls to emit metrics and tracing information. In an environment like Ballerina, especially with the language’s explicit awareness of networked systems, it can intelligently emit the most often required observability information with no intervention of the developer.
More information on Ballerina and its observability features can be found in the resources below:
Subscribe to get your daily round-up of top tech stories!