I’m frustrated. Chatbots have the potential to be amazing. The Star Trek computer seemed like it was finally becoming a reality. So far, we’ve been let down. Most chatbots are rubbish and it’s the tools that are to blame. If I want to build a chatbot with some semblance of intelligence I have to design for a myriad of possibilities. Surely this is what AI should be doing for me?
It doesn’t have to be this way. There is some amazing technology that could help. That technology is still stuck in the world of academia, but I want to change that.
These are the technologies that could make chatbots intelligent:
- Semantic parsing converts user expressions into a form that the computer can understand
- Automated planning chooses a series of actions to achieve a desired goal
- Natural language generation allows computers to respond to people in their own language
Semantic parsing for pizza joy
“I’d like a pizza with anchovies.” It’s quite clear (to you and me) that I want the anchovies on the pizza. But what about “I want a pizza with fries”? I would probably be surprised if my pizza was delivered with a topping of fries.
This particular problem is called prepositional phrase attachment. Does the the prepositional phrase “with fries” attach to “pizza” or “want”?
Traditional natural language parsers such as Parsey McParseface also have to deal with this problem. However, a traditional parser will only tell you the sentence structure. It won’t attempt to describe the meaning of the sentence as a whole.
A semantic parser, on the other hand, will translate the sentence into a form the computer can understand. In the case of a chatbot, this would normally be a structure that is referred to as an intent. An intent for “I’d like a pizza with anchovies” might look like this:
On the other hand, the intent for “I want a pizza with fries” would look like this:
In the first case, the anchovies are bundled with the pizza in a “toppings” property. In the second case we have two items, pizza and fries.
Some chatbot tools, such as Dialogflow, incorporate semantic parsers that can do this type of analysis. However, their sophistication is currently limited. They do not output intents with the kind of deep structure we have described.
Semantic parsers that do deep analysis have been around for a while. For example, the Geoquery dataset was described in a paper by John Zelle and Raymond Mooney in 1996.
Their system took input queries like this:
- What are the high points of states surrounding Mississippi?
- How large is the largest city in Alaska?
It translated these to queries that could be evaluated directly on the database, returning the answer to the user.
The current state-of-the-art for this dataset is 91% accuracy, achieved by Percy Liang and colleagues in 2011.
So if the technology is there, why don’t chatbot tools support this type of more complex analysis? There are probably several reasons:
- No-one is asking for it. Chatbot designers don’t even know that it’s possible. It doesn’t make sense to build something no-one wants.
- It adds complexity to the tools. The intents handled by most current systems are nice simple flat structures. If they became trees of arbitrary depth and complexity, it would make everything more difficult.
- It makes things harder for the chatbot designer, who now has to figure out how to handle these trees.
Basically, doing AI is hard. Who knew?
Until chatbot designers start demanding more from their tools, and until they are willing to commit serious effort to crafting intelligence, we can expect chatbots to remain stupid.
Automated planning to the rescue
Let’s say we’re able to overcome some of these barriers and we get some nice complex trees from our users. What are we to do with them?
We can view a user request as specifying a goal or desire of the user. It turns out there’s a whole branch of AI dedicated to figuring out how to satisfy these goals automatically: automated planning. Wouldn’t it be nice if we didn’t have to tell the chatbot what to do, but it just figured out the best sequence of actions in each situation? Well, that’s the promise of automated planning.
There would still be work to do, of course. We would need to describe the “world” that the chatbot resides in, so that it knows what actions it can take, and what the effect of those actions is.
There are formal languages designed to do this. The most commonly used is Planning Domain Definition Language, or PDDL for short.
In fact there are a whole bunch of variations of PDDL for different types of problems. One type of problem that may be useful for chatbots is Partially Observable Markov Decision Processes, or POMDPs for short. These allow for uncertainty not just in the result of actions, but also in the current state of the world. They have been applied with success to spoken dialogue systems by Steve Young at Cambridge, and his colleagues.
In this diagram, “SLU” is a spoken language understanding unit, and “NLG” is a natural language generation unit.
The design is partly motivated by the challenges specific to speech recognition. Here the output of the spoken language understanding unit may include uncertainty about what the user said.
In my opinion, this also applies to chatbots, where there is still uncertainty and ambiguity about the meaning that the user intended.
For example, maybe I really did want fries as a topping.
This type of ambiguity relates to sentence structure. Ambiguous words can also be a problem. For example the word “run” has 606 meanings in the Oxford English Dictionary. Uncertainty from any type of ambiguity can easily be incorporated into a POMDP.
Instead of tracking a single state, in the POMDP approach the system keeps track of a distribution over possible states. The planning system chooses the best action to maximise the expected reward given this distribution over states. This means it is able to act rationally despite uncertainty. (The planning system is depicted as the “dialogue manager” in the above diagram.)
So we’ve saved ourselves a bunch of work. Instead of describing how the chatbot should act in every situation, we’ve just described the world, and the chatbot figures out the best thing to do. This should allow our bots to be much more intelligent.
But we still need to list out all the things the bot can say. What if we want our bot to be more expressive? What if it could construct sentences of its own accord? That’s where natural language generation comes in.
Talking ‘bout my natural language generation
This is another huge sub-field of natural language processing. Current systems will generally take a template sentence with perhaps one or two slots that can be filled, and return this to the user. But what if we want to give the user more detailed information?
An example from a pollen forecast system described in the Wikipedia article on the topic is:
Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4.
This is generated automatically from data on the pollen levels in different parts of Scotland.
Imagine if you could do the same for your data. You could ask for a summary of your web analytics; you could ask about retention, new users, or landing pages. Your queries could be translated into a set of database queries whose results are then summarised for you. The possibilities are endless.
Back to the chatbot future
The chatbot market is expected to grow around 25% a year between now and 2025. But can that expectation be made a reality?
I talk to very few people that know what a chatbot is, and even fewer that have used one. I only know one person that enjoys using chatbots.
The growth of the chatbot market is predicated on the phenomenal growth of messaging platforms. But if these users cannot be converted to chatbot users, the chatbot market will not grow. Personally, I doubt we can convince a majority of people to use chatbots in their current state.
The future of chatbot technology is in our hands. Will we demand more from our chatbot tools? Or will we expect users to adapt to our limited technology, and learn not to expect intelligence? It’s up to us.