Native Analytics On Elasticsearch With Knowi
Table of Contents
- Creating Your First Visualization
- Connecting to Elasticsearch
- Writing Your First Query
- Creating Your First Visualization
- Multi-Index Joins
- Joining Your Indexes
- Search-based Analytics & Self-Service Analytics with Knowi
is a distributed, open-source, highly scalable search and analytics engine built on Apache Lucene and developed in Java. It allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. With it’s fast search responses, flexible structure, and extensive API, it’s no wonder it’s being utilized for a growing number of use-cases from simple text search to log analysis.
is an analytics platform that natively integrates with Elasticsearch so you can leverage the speed of Elasticsearch to visualize large amounts of data from multiple indexes rapidly. Knowi allows you to query Elasticsearch directly using its native DSL, or use a drag-and-drop interface to build queries quickly without prior knowledge of the query syntax. Knowi also natively integrates with over 30 data sources, allowing you to join your Elasticsearch data across indexes, or blend with other SQL/NoSQL/REST-API sources on the fly to create new datasets that can be used for downstream analytics. From there, you can choose from a host of visualizations options to create custom interactive dashboards, run ad-hoc analysis, use search-based analytics to ask questions from your data, and more.
This post is an end-to-end tutorial on using Knowi for Elasticsearch analytics
. We’ll start by natively connecting to data in your Elasticsearch cluster. From there, we’ll show you how to create visualizations from it in just a few minutes, perform joins across multiple indexes, and use Knowi’s search-based analytics feature to ask questions from your data in plain English to glean instant insights.
Sign up for a free Knowi account here
to get started.
Creating Your First Visualization
In this section, we’ll step through using the Knowi UI to connect to your Elasticsearch cluster in the cloud to visualize and analyze data from it.
Connecting to Elasticsearch
Knowi has broad native integration to other NoSQL, SQL, REST-API and JSON/CSV data sources. To get started, select your data source and configure the connection. Your data stays in the source so there are no ETL processes to build or ODBC drivers to install.
After logging in to Knowi, we’ll start by establishing a connection to your Elasticsearch data source.
- From the Playground dashboard, select “Datasources” on the left-hand side panel then click ‘New Datasource’
- Select Elasticsearch from the list of datasources
- Once in the ‘New Datasource’ page, start by giving your datasource a name
- Enter your Elasticsearch credentials including the Endpoint URL and deployment username and password
- Choose your Elasticsearch version (Version 5+ by default)
- Click ‘Test Connection’ to confirm successful connection to the Elasticsearch cluster
- Hit ‘Save’
Connecting to Elasticsearch cluster (Source - knowi.com)
Writing Your First Query
Once connected to your Elasticsearch cluster, Knowi automatically pulls a list of your indexes along with field samples. To start building your queries, Knowi gives you the option to auto-generate your queries using its drag and drop Query Builder via the UI. This is especially useful for users not as familiar with the native DSL. For more advanced users, you also have the option to write your queries directly in the smart Query Editor, a versatile text editor specialized for editing code.
In this example, we’ll select the sending_activity index (which contains email sending activity data) and select the fields we want to analyze from the auto-generated fields from the Query Builder.
- Open the Query Generator by clicking ‘Start Querying’
- In the ‘Indexes’ drop down menu, choose the sending_activity index
- In the ‘Metrics’ dropdown, select the fields customer, message_type, sent, and opened
- Notice that in the Query Editor to the right, a native Elasticsearch JSON Query is being auto-generated (If you already knew the query you needed, you could’ve pasted or written it directly)
- Click ‘Preview’ to to instantly preview the results, returned in tabular format
- After previewing the results, give your query a name then hit ‘Save & Run Now’
Use Query Builder to generate queries or write queries directly with the Query Editor (Source - knowi.com)
Creating Your First Visualization
Once the query is saved, Knowi creates a “Virtual Dataset” from the query results and stores it in Knowi’s “Elastic Store” data warehouse that can store and track the results. Unlike traditional warehouses that require complex ETL processes and pre-defined schema, the elastic store is a flexible, scalable, schema-less warehouse. The stored virtual dataset is reusable, and will be the foundation for most of what you’ll do in Knowi, like creating visualizations, adding them to dashboards, and much more.
In this example, we’ll create a stacked column bar chart that shows the total sent emails for each customer by message type. First, we’ll create a new dashboard then the chart itself.
Creating A New Dashboard
- On the left-hand side panel, click ‘Dashboards’
- Hit the ‘+’ icon to create a new dashboard and give it a name then click ‘OK’
- Drag the widget/report you previously created into the dashboard. By default, it will be in grid form
Creating a New Dashboard in Knowi (Source - knowi.com)
The Analyze Screen
- On the top-right corner of the widget, click the ‘More Settings’ icon then select ‘Analyze’
- In the following screen drag customer and message_type to the ‘Groupings/Dimensions’ section
- Drag sent to the ‘Fields/Metrics section. In the ‘Operation’ dropdown, select ‘Sum’’
- Notice that for Wells Fargo (first row), there were 22,406,800 total marketing messages sent
The Widget Analyze screen (Source - knowi.com)
- At the top of the screen, click the ‘Visualization’ tab which takes you to the visualization settings screen. We want to create a stacked column chart with customer in the x-axis and sum of sent in the y-axis
- In the ‘Settings’ section under the ‘Visualization Type’ dropdown, select ‘Stacked Column’
- In the ‘Options’ section under the ‘Grouping/Legend’ dropdown, select message_type. We can now visualize the total number of emails by message type for each customer
- Hit the ‘Clone’ icon on the top right, to create a new widget derived from the original. This allows us to keep the original widget as is
- Give the cloned widget a name, then add it to the dashboard
The Visualization Settings screen (Source - knowi.com)
Drilldowns allow you to visually navigate and analyze data in powerful ways. They can be set into another widget, another dashboard, or the same dashboard. Drilldowns can be many levels deep with support for combining different drilldown modes. Data from the parent widget can be used as keys into the drilldown widget or dashboard to filter the data specifically for the point selected. Drilldowns can be configured using the ‘Drilldowns’ menu option on each widget in the dashboard.
In this example, we’ll set up a “Widget” drilldown from the stacked column chart (Parent) widget into the original data grid chart that filters the results based on a specific customer.
- On the top-right corner of the bar chart widget, click the ‘More Settings’ icon then select ‘Drilldowns’. The drilldown menu box will appear
- Under the ‘Drilldown type’ dropdown, select ‘Widget’
- For ‘Drill into’, select the name of the widget you want to drill into.
- For ‘Optional Drilldown Filters’ select ‘customer’ = ‘customer’
- Hit ‘Save’
- Remove the original grid chart widget from the dashboard
- In the bar chart, click on any of the bars representing each customer (i.e. Wells Fargo)
- By clicking on Wells Fargo, we were able to “drill down” in the original grid chart, but this time only showing details for the customer Wells Fargo
Adding a Drilldown (Source - knowi.com)
Being part of the “ELK Stack” it's no surprise Kibana is considered the default visualization tool for Elasticsearch. However, its drawback is that each visualization can only work against a single index. So if you have indices with strictly different data, you’ll have to create separate visualizations for each.
Knowi provides a solution for this, as it allows you to join your Elasticsearch data across multiple indexes and blend it with other SQL/NoSQL/REST-API datasources, then create visualizations from it on the fly with a user-friendly UI.
In the following steps, we’ll join our initial sending_activity index with another index in our cluster with customer-specific information to create a new combined dataset that can be used for downstream analytics and visualizations.
Joining Your Indexes
Since we’ve already created a query for the sending_activity index, let’s go back and edit it to add a join to our second index sending_activity_customer.
- From the left-hand side panel of the Knowi UI, select ‘Queries’
- Look for the ‘Elasticsearch - Demo’ query we ran earlier and click the ‘Edit’ icon
- In our first query, let’s add the date and conversions field to the metrics
- Click the ‘Join’ button on the lower-left side of the screen. Select your Elasticsearch datasource from the dropdown
- This will populate the ‘Join Fields’ section and another ‘Query Builder’ and ‘Query Editor’ sections below the first one
- In the ‘Indexes’ dropdown menu, choose the sending_activity_customer index
- In the ‘Metrics’ dropdown menu, select your key field customer, followed by the street and state fields.
- Notice that in this index contains mostly customer information-related fields
Use the Query Builder to select metrics from your second index (Source - knowi.com)
So far, we have the query from our first index that gives us the customer, the type of email sent, and how many were sent, opened, and converted. In our second index query, we get address information from the same customers. Now, it’s time to join these two indexes together.
- In the ‘Join Fields’ section, click ‘Join Builder’. Note that you can also type in the join free-hand in the text bar
- Once the fields are retrieved, select ‘INNER JOIN’ as the ‘Join Type
- Under ‘Left Field’ (sending_activity index side), select the key field customer
- Under ‘Right Field’ (sending_activity_customer index side), you’ll also select the key field customer then save
- Now, let’s click ‘Preview’ to see how our combined dataset looks like
Use the Join Builder to combine your indexes (Source - knowi.com)
In the new combined dataset, we have customer, message_type, sent, opened, conversions, and date fields from our first index and the street and state fields form our second index, joined on the key field customer. As you can see, we were able to easily run the queries from each side of the join then combine them to get the results with just a few clicks. We can now use this combined dataset to create new reports and visualizations.
Search-based Analytics & Self-Service Analytics with Knowi
Knowi’s search-based analytics
is a powerful Google-search-like feature, allowing you to type in questions from your data in plain English and get answers instantly. This is especially useful for non-technical end users, allowing them to gain quick insights from the data even without prior knowledge of the underlying data structure or query syntax of the datasource.
In the following steps, we’ll use a brief example of using Knowi’s search-based analytics to ask questions from the blended email sending activity dataset we created in the previous section.
- Let’s find ask our data to find out the “maximum emails sent”
- Notice that as you type, Knowi auto-suggest the question you are asking
- When the results are returned, you can check to confirm that the ‘Max’ operation was performed on the sent field
Type question in the NLP Text Bar to find the maximum emails sent (Source - knowi.com)
- Now, let’s find out where each of the customers are located by asking “street and state by customer”
- Notice that it automatically knew to group the results by customer then return the address for each
Type question in the NLP Text Bar to find out Customer address (Source - knowi.com)
- Finally, let’s take things a little further and find out what our conversion rate it is by customer on a weekly basis
- Notice that it automatically knew to apply the ‘Week’ operation on the date field, then group the results by customer and by week
Type question in the NLP Text Bar to find conversion rate by customer weekly (Source - knowi.com)
As we’ve seen, by simply typing in questions in plain English, we were able to get answers back instantly from our combined Email Sending Activity dataset. You also have the option to take these results and create new widgets that can be added to your dashboard.
In summary, we used Knowi to seamlessly connect and write native queries on data stored in your Elasticsearch cluster then create visualizations from it in minutes, demonstrate how you can perform joins on multiple indexes in your cluster on the fly, and used its search-based analytics feature to ask questions from your data without the need for prior knowledge of the underlying query language. Visit Knowi
to learn more about how its analytics capabilities can leverage the strengths of your Elasticsearch implementation.
Subscribe to get your daily round-up of top tech stories!