In this Post we are going to use real Machine Learning and (behind the scenes) Deep learning for Natural Language Processing / Understanding!
In this post we are going to use the RASA conversational AI solution both for the NLP/U engine and for the dialogue part
RASA — Is an Open Sourced Python implementation for NLP Engine / Intent Extraction / Dialogue → in which all of the above run on your machine / On premise → NO CLOUD!
As a side note, we are using machine learning and data science extensively in our product at Lemonade https://bit.ly/2NG88T0 and we are hiring :) (PM me)
The inspiration/reference to this post was the great tutorial video made by: Justina Petraityte, thanks! I tried to break it down and used my own version which is slightly different on Mac, so I hope you’d enjoy it and find it useful.
TL;DR — skip to here
First here is a short introduction (if you are well familiar with chatbot/dialogue architecture or want to get down to business , skip to here)
When the chatbot input is text, for example: “I want to order pizza”
first the bot needs to run a NLP (Natural Language Processing) engine in order to parse the sentence (create a structure out of it or “try to understand it” if you may)
…but what’s next? let’s assume that the bot classify it successfully to the correct “Intent”, in our case lets say the “Intent” is: order_pizza,
the common sense says that we must have (from the user), the full name, address, pizza size, topping, what else? for this very purpose we need a “Dialogue management” component, in this component we would configure the various “Intents” that our bot is supporting and for each “Intent” (the next part — part 2) we would specify which “Slots”/”entities” are mandatory and which are just optional — this configuration is often called: “Domain definition”.
If you already noticed in our example, the full name, address, pizza size and topping (slots) are mandatory, by mandatory we mean that our bot would continue to interact with our user until this information is fulfilled and only then it would be able to complete the task — pizza ordering!
3. Bots, MODs & Multiplayer Co-op: why Slack is game-like — NOT gamified
(I’m going to create a more detailed video on it — so stay tuned :))
And now let’s get down to business.
To get the best results run with Python 3.6.4 (and no later), better to work in a virtualenv (if you getting errors) try creating the env with:
pip3 install virtualenvpython3 -m virtualenv <env_name>source <env_name>/bin/activate
Make sure you run the above command with python3 pointing out to the python 3.6.4 mentioned
You can choose to install it:
pip3 install rasa_nlu==0.12.3
If you want to be able to debug your code and dive in the implementation you should clone it:
git clone [email protected]:RasaHQ/rasa_nlu.git
cd rasa_nlu
#go to the relevant release (0.12.3):git reset --hard d08b5765e92ea27741926b1246e5e22713158987
pip install -r requirements.txtpip install -e .
Either way create a working directory in which we will place all our files for this tutorial:
mkdir rasa_democd rasa_demovim requirements.txt
If you don’t want to use vim, just create the file requirements.txt . and edit it in a text editor (sublime or notepad)
If you cloned the code skip to configuration.
In the file “requirements.txt” add the content below:
Now run (for the location of the file):
pip install -r requirements.txt
First lets create the configuration files and the training file.
Lets start with the configuration file. Create a dir named config:
mkdir configcd configvim config.json
If you don’t want to use vim, just create the file requirements.txt . and edit it in a text editor (sublime or notepad)
In this file “config.json” add the content below:
If you are using a newer version of Rasa i.e configuration is done via yml, create config/config.yml:
Go back to the parent workspace directory — in our case “rasa_demo”
Now let’s create the training data, for that matter, examples for sentences that we think our user is going to say and to which Intent and entities our chatbot should break it.
First let’s create a directory “data” and in it create a file “training_data.json”
run:
cd ../mkdir datacd !$vim training_data.json
If you don’t want to use vim, just create the file requirements.txt . and edit it in a text editor (sublime or notepad)
with this content:
There is a great tool (rasa_nlu_trainer) you can use to add new examples/Intents/entities.
To install it, run in terminal:
npm i -g rasa-nlu-trainer
If you don’t have npm and nodejs go to here and follow the links to npm and nodejs in the installation part.
Now launch the trainer:
rasa-nlu-trainer -v <path to the training data file>
In our example we the file under the data directory:
rasa-nlu-trainer -v data/training_data.json
Here is a screenshot of the trainer:
In the screenshot we can see the Intent part: “order_pizza” and the user input as the text: “I want to order large pizza”. In addition I marked the word “large” and now I get an option to add it as an entity: “size” which its value in this example is: “large”
For this example we have only 2 examples per Intent (btw this is the minimum you must have for each Intent otherwise you’d get an error in training time.
Now we are ready to actually train our Machine Learning NLU model.
Create a file: “nlu_model.py” in the parent workspace directory with this code:
If you are using the newer version of Rasa (no NLU conversters/config via yml), use this nlu_model.py:
Now run (make sure that you are in the parent workspace directory):
python nlu_modle.py
Notice that in the above code the call to run() function is commented out.
Now try to uncomment it and comment out the call to the function train().
Run again:
python nlu_modle.py
You should see the output:
{u’entities’: [], u’intent’: {u’confidence’: 0.78863250761506243, u’name’: u’order_pizza’}, ‘text’: u’I want to order pizza’, u’intent_ranking’: [{u’confidence’: 0.78863250761506243, u’name’: u’order_pizza’}, {u’confidence’: 0.21136749238493763, u’name’: u’greet’}]}
As you can see our model has successfully classified the text to the Intent: “order_pizza”, with confidence: “0.788…” — to get better probability, simply add more examples in the training data.
Congratulations! we’ve just built an Interpreter for our chatbot — half way :)
You can take a break and get to the second part later — don’t forget to clap :)
Now you are ready to build the dialogue side.
Go to part 2 of the guide (link will be added soon)
(BTW We are building amazing AI & chatbots at Lemonade)
If you’re experiencing issues!
git clone the latest version, install all the requirements like this:
Run:
virtualenv -p python3 <env name>
python3 -m virtualenv <env_name> pip3 install
pipeline:- name: "tokenizer_whitespace"- name: "ner_crf"- name: "intent_featurizer_count_vectors"- name: "intent_classifier_tensorflow_embedding"batch_size: [64, 256]epochs: 1500embed_dim: 20
2. Now train:
python3 rasa_nlu/train.py \--config `pwd`/config/config.yml \--data `pwd`/data/training_data.json \--path `pwd`/projects/
3. Now run:
python3 rasa_nlu/server.py --path `pwd`/projects --config config/config.yml
Make sure the config.yml file is in the same place.
If you have issues with Python pkgs follow this:
virtualenv -p python3 <env name>
python3 -m virtualenv <env_name> pip3 install
If you get the below error when trying to run the NLU (> python nlu_model.py):
“OSError: Can’t find model ‘en’”
It means that you haven’t got the spacy “en” language model downloaded.
run:
python3 -m spacy download en
Or:
python -m spacy download en