In large monolithic applications, error tracking and monitoring can become ineffective due to unclear ownership. This guide proposes a structured approach to assign accountability using domain annotations to enhance monitoring efficiency and team responsibility.
Monitoring such applications, especially with multiple teams, poses challenges. Without clear ownership, error tracking tends to be generic and often ignored. While having on-call engineers determine which team should respond to monitoring alarms is one solution, a more efficient approach is to embed domain and team information directly into your monitoring system.
Microservices architecture can effectively include domain and team information by breaking down a monolithic application into smaller, independent services. However, adopting them comes with significant drawbacks:
Creating a modular monolith is another approach to managing clear ownership within a monolithic application. Unlike microservices, a modular monolith organizes the application into distinct, cohesive modules that can be developed and maintained independently by different teams. This approach has several advantages and challenges:
Imagine marking classes and functions with domains, and mapping these domains to the respective teams within the codebase. This is where domain annotations come in.
Domain annotations allow you to label every part of your application's code, clearly indicating accountability. By tagging parts of your code with domain annotations, you can:
As an example, let’s explore how Domain Annotations are processed for REST requests.
Here’s a high-level overview of the process depicted in the following diagram:
For more information, refer to the monolith-domain-splitter library.
Defining ownership at the class level is straightforward with domain annotations. By applying top-level annotations to main classes, ownership propagates down to all detailed resources within those classes. Each team can label classes they own with the appropriate domain annotations, ensuring clarity and accountability without marking every single method.
If multiple teams own code in one class and immediate refactoring isn’t appropriate, you can mark individual methods with different domain annotations. These method-level annotations take priority over class-level annotations, allowing specific methods to be assigned to different teams, and providing flexibility without complicating the overall structure.
While domain annotations are handy, some cases may not support them. For instance, we encountered issues with Quartz job creation due to a clash between Quartz's AOP logic and the AOP logic used for domain annotations.
For jobs and processes that cannot be annotated directly, we used the DomainTagsService
within the parent job implementations. This approach allowed us to add domain tags manually within the job's execution logic.
Here’s an example of how we integrated DomainTagsService
into a Quartz job:
final override fun execute(context: JobExecutionContext) {
domainTagsService.invoke(domain) {
withLoggedExecutionDetails(context, ::doExecute)
}
}
To simplify monitoring each team's activities in Datadog, you can assign artificial service names for spans of different teams. This approach ensures every team has a dedicated section in Datadog's monitoring tools. While using artificial service names can be confusing if you have many services to manage, it becomes manageable with a limited number of backend services.
Adding prefixes to these artificial service names helps maintain organization and clarity in your Datadog setup, making distinguishing them from real services easier.
Using artificial service names for logs can create confusion, as the same log entry might appear under different services.
For example, consider two endpoints using the same authentication service. The authentication logic will produce logs under different artificial services if these endpoints are annotated with different domains. This could create confusion when exploring logs, as they appear under multiple service names.
To avoid this issue, it's better to apply artificial service names only to spans that are aggregated together in traces, reducing confusion.
Here is a visual representation of this problem:
Using artificial services enables you to work with APM traces and filter by service in Datadog Metrics, which are stored for an extended period, allowing for tracking changes over a prolonged period.
Below is a screenshot of a monitor in Datadog that uses the artificial service name monolith-assets
in the query:
Below is a screenshot of a dashboard in Datadog that uses the artificial services prefix monolith-* in the filter. As you can see, there is also a separate latency on the chart for each service. All other service metrics are also available separately.
This guide outlines the steps to integrating domain annotations into your project using the monolith-domain-splitter library, which requires the Datadog agent for full functionality. While adding domain and team annotations to logs may work without Datadog, it has not been thoroughly tested in such scenarios.
Use enums to represent different domains and teams within your application
enum class DomainValueImpl(
override val team: Team,
) : DomainValue {
PROJECT(TeamImpl.LIONS),
FILE(TeamImpl.SNAILS),
}
enum class TeamImpl : Team {
LIONS,
SNAILS,
}
Add monolith-domain-splitter library and Opentracing dependencies.
If you use Gradle, add to your build.gradle.kts
file:
dependencies {
api("io.opentracing:opentracing-api:0.33.0")
api("io.opentracing:opentracing-util:0.33.0")
implementation("io.github.feddena.monolith.splitter:monolith-domain-splitter:0.0.2")
}
Annotate your main application class to include the monolith-domain-splitter
package for component scanning.
package your.app.pkg
@SpringBootApplication(scanBasePackages = ["your.app.pkg", "io.github.feddena.monolith.splitter"])
class StorageApplication
fun main(args: Array<String>) {
runApplication<StorageApplication>(*args)
}
Use the @Domain
annotation to mark your classes and methods with specific domains.
@Domain("FILE")
class FileEndpoint {
// Your endpoint logic
}
For cases that cannot be annotated directly, use DomainTagsService
to wrap the logic.
fun executeNotSupportedByAnnotationsLogic() {
domainTagsService.invoke(domain) { executeLogic() }
}
@Configuration
@EnableAspectJAutoProxy
class DomainConfiguration(
private val domainRegistry: DomainRegistry,
) {
init {
DomainValueImpl.entries
.forEach { domainRegistry.registerDomainValue(it) }
}
}
@Configuration
class DomainTraceInterceptorConfigurationImpl : DomainTraceInterceptorConfiguration {
// name of the service in datadog if you wish to override it
override fun getServicesToOverride(): Set<String> {
return setOf("real-service-to-override")
}
// prefix that will be used to create artificial services names
// artificialServiceName = getServiceNamePrefix() + team
override fun getServiceNamePrefix(): String {
return "monolith-"
}
}
@Component
class WebMvcConfigurerDomain(
private val domainHandlerInterceptor: DomainHandlerInterceptor,
) : WebMvcConfigurer {
override fun addInterceptors(registry: InterceptorRegistry) {
registry.addInterceptor(domainHandlerInterceptor)
}
}
Use artificial service filters for monitors, dashboards, and APM traces filtering in Datadog to keep track of different domains and teams. Ensure your project has the Datadog agent configured for full functionality.
Domain annotations provide a straightforward approach to simplifying the monitoring of monolithic applications in Datadog. Use the monolith-domain-splitter library in your project to ensure that each domain in your monolithic application is well-organized and tracked for better observability and accountability.
Enhanced Ownership and Accountability: By annotating parts of your code with domain annotations, you can clearly define which team is responsible for each domain. This facilitates better organization and targeted monitoring.
Improved Log and Trace Management: Domain annotations allow you to filter both logs and traces based on specific criteria, such as team responsibility, enabling quick identification and resolution of issues.
Flexibility with Artificial Services: Using artificial service names for spans (not logs) ensures that logs remain clear and traceable to their true origins, avoiding confusion.
Overcoming Integration Challenges: For cases where annotations cannot be directly applied, such as with certain job execution frameworks like Quartz, using services like DomainTagsService
directly in the job implementations ensures that domain-specific monitoring can still be maintained.