Building a virtual coworker who understands what you mean, not only what you say

To build a virtual analyst and not a robot, our first challenge was to tackle conversation management. We have all seen hundreds of video on Youtube where Alexa and Siri can’t handle basic questions… and indeed: it’s hard.

So how do we tackle this problem?

Aiden is an expert system that can intelligently understand and fulfill requests using knowledge that we have taught it from the marketing world. You interact with Aiden with ordinary language, not commands, keywords or buttons, just like you would with any colleague.

Our users interact with Aiden using “natural language”. Most people think of it as extracting intent and entities from utterances — and it is indeed a key component. Yet, being able to handle a real conversation with a user involves a lot more steps. In fact, extracting entities is just the beginning of complex process.

In a nutshell, when a user asks a question, we start with the user queries, which are string phrases, and we want to build typed objects that can be manipulated, and route them to the correct action handler.

A summary of the phases we built:

Extract intent and entities
Contextualize
Consolidate
Execute
Compose an answer

1. Extract entities

From the input phrase we extract raw data that we will be able to manipulate. Our recognizer is a combination of custom pre-processors and Microsoft LUIS (https://www.luis.ai/home).

It returns a JSON data object with the recognized intents, the score, and the annotated entities.

2. Contextualize

User/Aiden interactions are rarely a single back and forth. Aiden needs to be able to understand and follow the context of the whole conversation.

A typical example is what we call “follow-ups” where the user can amend a previous query without repeating it entirely.

Handle follow-up queries

In case there are conjunctions or pronouns such as “and” or “it” the recognizer will assign the intent Follow-up and we extend the previous query

User preferences

Every marketer will have their own way to assess the performance of their campaigns. They might be looking for volume, or they might prefer to keep their spend under control etc. This will become their “Key Performance Indicator (KPI)” and it is part of what we call a user’s “preferences”.

We store them in a database, so when the user asks for her KPI, or her “performance” on various campaigns, we interpret the query based on the preferences stored in the database for this user.

Aiden also lets the user sort the results. For instance, they can ask: “what are my best performing campaigns”. Aiden needs to sort the results, and interpret the qualifier. For example “best” in terms of CPI (Cost Per Installation) actually means “lowest CPI”.

3. Consolidate

One of the differences a chat interface has with other web or desktop applications is that the navigation is free and queries can be missing some requirements. Some applications can use forms that need to be validated. In our case, we prompt for missing parameters until the query can be handled.

Keeping the same example as previously, if the user asks “What are my best performing campaigns?” for the first time, we have not KPI in the database. We prompt for the KPI and give a list of possibilities from the metrics available to the user.

Then we will store the given KPI value and consolidate the query with the chosen metric.

Similarly, the account is required, and in case of users with multiple accounts setup we can ask for which account in case of a new session.

4. Execute

We send the finalized queries to the backends to get the results.

A single user request can translate into multiple queries. For example if the user asks for an “account overview”, Aiden interprets it as :

Query on 3 most spending campaigns
Query on 3 campaigns with the worst cpi
Query on 3 campaigns with the best cpi

All queries are sent to the backends and Aiden collects the results.

5. Compose an answer

The results are given to a message builder, which does some operations that reverse the decoder.

For example a country code UK is converted to “United Kingdom”.

The account name is also displayed using the accounts database.

Output

The result converted to a Message object of the Microsoft Bot Framework (https://docs.botframework.com).

In the platforms that need or require it such as Slack, we augment this Message object with custom funcionalities (https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-channeldata).

For example the account name can be displayed in a footer on Slack, (https://api.slack.com/docs/message-attachments) so we convert the Message to a custom format before sending it.

If you’re also tackling NLP, we would love to hear your thoughts in the comments.

And if you like the work we do, we’re hiring! Find out about our job postings at https://angel.co/aiden/jobs