In Part 1 of our guide to AI, we introduced artificial intelligence through a process of defining increasingly more sophisticated agents. Agents are programs that perceive their environment through sensors and act on the environment through effectors. They can be proactive in their actions and autonomous in their decision-making.
As we continue to build our understanding of AI, with a focus on chatbots, we turn our attention to sensing what a user says. For conversational interfaces, this is the most important sensor. It uses a combination of natural language processing tools and training provided by the chatbot designers.
When a human (or another agent) says something to a chatbot it needs to convert that into actionable information. We divide that process into four distinct steps: perception, analysis, disambiguation, and incorporation. Let’s look at each in turn.
First, we need to actually perceive the words spoken to us.
On platforms such as Facebook Messenger, this means that we need to have access to the text a user types. Simple enough and handled by the Facebook Messenger API. Whether it is Facebook, Microsoft Teams or Slack, every platform provides a way for our chatbot to perceive what is being said.
On voice-based platforms such as Alexa chatbot perception is more difficult. While the bulk of the complexity of voice recognition is dealt by the platform itself some issues remain. Developers must deal with defining how a user launches into a specific skill (a skill can be thought of as Alexa’s version of your own bot) as well as helping Alexa determine when the user has finished speaking (or has said enough). A look at the Amazon Alexa API documentation reveals how much work there is to recognize that the user said something relevant.
Nevertheless, whether dealing with written or spoken text, we can, with a useful degree of confidence perceive what a user has said. The next step is to analyse what was said.
There is are two sides to analysis. We need to syntactically analyse something (identifying the individual words, their purposes and relationships in a sentence) and we need to semantically analyse a sentence.
Consider for example the phrase: “I am looking to go to the Rembrandt exhibition on the 14th”
A syntactic analysis identifies the individual words, sentence structure, language used, which words are nouns or verbs, etc.
Google’s natural language processing (NLP) tool nicely illustrates this. Pop that phrase in and you get a break down of the sentence identifying three verbs “am”, “look”, “go” , three nouns “Rembrandt”, “exhibition”, “14th”, etc.
A semantic analysis using Google is a bit poorer.
The semantic analysis is poor because we lack too much context. The NLP software determines that a person is mentioned and potentially an event. However, it is missing the overall purpose of the phrase.
Our chatbot would have to figure out that “I” refers to the user, “looking to go” means that the user is interested in finding out more or purchasing tickets, that Rembrandt and exhibition are related, etc. This is where disambiguation comes into play.
Disambiguation is the process of settling on a specific meaning — usually using context to help in the process. As chatbot developers our aim is to understand the user’s intent.
Tools such as api.ai or wit.ai assist by combining NLP with training tools that allows us to explicitly map sentences to specific intents.
We can provide phrases such as:
“I am looking to go to the Rembrandt exhibition on the 14th”“I want to book tickets to the Rembrandt exhibition”“I am interested in the Rembrandt exhibition”“That Rembrandt show that is on — how can I go?”and manually map them all to a specific intent — such as “Purchase exhibition tickets”.
The NLP-powered tools can then deal with phrases that do not exactly match the ones we entered but are semantically and syntactically similar enough. The tool will provide a confidence value (from 0 to 1) and we can fine-tune our bot to react accordingly.
Furthermore, “Rembrandt” can be turned into a variable representing any person or thing. So the same phrases can deal with “Da Vinci” or “Picasso” or “Impressionism”.
As bot designers it is our task to make sure that overall context is specific enough so we can usefully train our bots. We need to think through the various intents and avoid placing them too close to each other (semantically) so that one intent will not bleed (in terms of interpretation) into another intent. Also useful is to have a fallback or default intent so that every phrase resolves to something.
Of course, we can also completely remove ambiguity by using button-based choices if the chatbot interaction interface allows for it.
With disambiguation done we can now incorporate the information the user has supplied into our agent’s overall knowledge of the world. Using these new beliefs we can react accordingly.
For example, if our chatbot’s goal is to sell a ticket to the user and our chatbot believes that the user just requested information on how to book tickets it will reply with the appropriate process for booking.
Current chatbot architectures don’t offer much sophistication at this level. Typically, a user input leads to a simple reaction and previous actions are not taken into account. This will soon change as we gain more experience in building chatbots and users demand better chatbots.
We have better tools than ever before to perceive what a user says. The combination of NLP to both analyse and disambiguate together with context-specific training makes chatbot design very exciting.
The next steps in chatbot development will focus on incorporating these tools within more sophisticated architectures. These should allow us to create increasingly better models of ours users and a better understanding of not only stand-alone interactions or short conversations but longer flows.
In the next installment of “Making AI work for you” we will look at what other information a chatbot can sense in addition to what the user types.
This post first appeared on deeson.co.uk
Find out more about our chatbot agency services in the meantime.