paint-brush
Integrating Manticore with Vector.devby@snikolaev

Integrating Manticore with Vector.dev

by Sergey NikolaevAugust 14th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Vector by Datadog is an open-source, high-performance observability data pipeline that collects, transforms, and routes logs and metrics. While Vector can work as an aggregator, it's more effective when combined with specialized data storage tools like Manticore. This combination is showcased through an example of indexing dpkg.log, a Debian package manager log file. Vector's configuration file, in TOML format, includes a source for the log file, a transformation to modify timestamp fields, and a sink to send data to Manticore. After setting up, data is passed to Manticore and properly indexed, allowing efficient log data management, transformations, and storage. This collaboration between Vector and Manticore provides a robust solution for collecting, transforming, and storing log data.
featured image - Integrating Manticore with Vector.dev
Sergey Nikolaev HackerNoon profile picture

Vector by Datadog is a high-performance, end-to-end (agent & aggregator) observability data pipeline that lets you collect, transform, and route all your logs and metrics. Additionally, it is open source. While it can serve as an aggregator itself, one can find it more effective to use Vector.dev in conjunction with a specialized data storage tool, such as Manticore.


Let’s look at how they can work together. For this, we’ll use an example of indexing dpkg.log, a standard log file of the Debian package manager.


The log itself has a simple structure, as shown below:

2023-05-31 10:42:55 status triggers-awaited ca-certificates-java:all 20190405ubuntu1.1
2023-05-31 10:42:55 trigproc libc-bin:amd64 2.31-0ubuntu9.9 <none>
2023-05-31 10:42:55 status half-configured libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 status installed libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 trigproc systemd:amd64 245.4-4ubuntu3.21 <none>


Configuration

Here is an example of the Vector.dev’s configuration file in the toml format:

[sources.test_file]
type = "file"
include = [ "/var/log/dpkg.log" ]

[transforms.modify_test_file]
type = "remap"
inputs = [ "test_file" ]
source = """
.vec_timestamp = del(.timestamp)"""

[sinks.manticore]
type = "elasticsearch"
inputs = [ "modify_test_file" ]
endpoints = ["http://127.0.0.1:9308"]
bulk.index = "dpkg_log"


Note that, in this example, we assume Manticore uses its default HTTP port 9308. If you use a custom http port, you should change your Vector.dev config appropriately. Also note that we added the transforms section to the config to rename the default timestamp field as it’s a reserved word in Manticore.


Results

Now just start Vector.dev with the config above, and the data from the dpkg log will be passed to Manticore and properly indexed. Here is the resulting schema of the created table and an example of the inserted document:

mysql> DESCRIBE dpkg_log;
+-----------------+---------+--------------------+
| Field           | Type    | Properties         |
+-----------------+---------+--------------------+
| id              | bigint  |                    |
| file            | text    | indexed stored     |
| host            | text    | indexed stored     |
| message         | text    | indexed stored     |
| source_type     | text    | indexed stored     |
| vec_timestamp   | text    | indexed stored     |
+-----------------+---------+--------------------+


mysql> SELECT * FROM testlog_3 LIMIT 3\G
*************************** 1. row ***************************
id: 7856533729353672195
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 startup archives install
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203091741Z
*************************** 2. row ***************************
id: 7856533729353672196
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 install base-passwd:amd64 <none> 3.5.47
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203808861Z
*************************** 3. row ***************************
id: 7856533729353672197
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 status half-installed base-passwd:amd64 3.5.47
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203814031Z


Conclusion

Thus, with the integration outlined in this guide, you can now easily and effectively index your log data by employing Manticore in collaboration with Vector by Datadog, a high-performance end-to-end observability data pipeline. This synergy between Vector.dev and Manticore not only offers a streamlined approach for managing log data but also extends the functionality by allowing transformations and routing. Whether you are dealing with simple or complex log structures, this integration provides a robust solution, making the process of collecting, transforming, and storing your data more accessible and efficient.


Also published here.