In the world full of Siri, Cortana & Alexa, have you ever wondered you can create a new friend of yours. Well it might not be that intelligent but it not worthless to try creating something new. With the current state of web apps, we can rely on various UI elements to interact with users. With the Web Speech API, we can develop rich web applications with natural user interactions and minimal visual interface, using voice commands. This enables countless use cases for richer web applications. Moreover, the API can make web apps accessible,helping people with physical or cognitive disabilities or injuries. The future web will be more conversational and accessible!
Here, we will use the API to create an artificial intelligence (AI) voice chat interface in the browser. The app will listen to the user’s voice and reply with a synthetic voice. Because the Web Speech API is still experimental, the app works only in supported browsers. The features used for this article, both speech recognition and speech synthesis, are currently only in the Chromium-based browsers, including Chrome 25+ and Opera 27+, while Firefox, Edge and Safari support only speech synthesis at the moment.
To build the web app, we’re going to take three major steps:
npm i apiai
npm install socket.io
npm i dotenv-extended
npm install express --save
Set up a web app framework with Node.js. Create your app directory, and set up your app’s structure like this:
.
├── index.js
├── files
│ ├── css
│ │ └── style.css
│ └── js
│ └── script.js
└── views
└── index.html
Then, run this command to initialize your Node.js app:
npm init
This will generate a package.json file that contains the basic info for your app.
Now, install all of the dependencies needed to build this app:
$ npm i apiai
$ npm install socket.io
$ npm i dotenv-extended
$ npm install express --save
We are going to use Express, a Node.js web application server framework, to run the server locally. To enable real-time bidirectional communication between the server and the browser, we’ll use Socket.IO. Also, we’ll install the natural language processing service tool, APIAI in order to build an AI chatbot that can have an artificial conversation.
Socket.IO is a library that enables us to use WebSocket easily with Node.js. By establishing a socket connection between the client and server, our chat messages will be passed back and forth between the browser and our server, as soon as text data is returned by the Web Speech API (the voice message) or by API.AI API (the “AI” message).
Now, let’s create an index.js file and instantiate Express and listen to the server:
'use strict';
var apiai = require('apiai');
var APIAI_TOKEN =apiai(" "); //use a api token from the official site
const APIAI_SESSION_ID = " "; //use a session id
const express = require('express');
const app = express();
app.use(express.static(__dirname + '/views'));
app.use(express.static(__dirname + '/files'));
const server = app.listen(process.env.PORT || 3000, () => {
console.log('Server listening on port %d ', server.address().port);
});
const io = require('socket.io')(server);
io.on('connection', function(socket){
console.log('a user connected');
});
// Web UI
app.get('/', (req, res) => {
res.sendFile('index.html');
});
io.on('connection', function(socket) {
socket.on('chat message', (text) => {
console.log('Message: ' + text);
// Get a reply from API.ai
let apiaiReq = APIAI_TOKEN.textRequest(text, {
sessionId: APIAI_SESSION_ID
});
apiaiReq.on('response', (response) => {
let aiText = response.result.fulfillment.speech;
console.log('Bot reply: ' + aiText);
socket.emit('bot reply', aiText);
});
apiaiReq.on('error', (error) => {
console.log(error);
});
apiaiReq.end();
});
});
Now,we will integrate the front-end code with the Web Speech API.
The UI of this app is simple: just a button to trigger voice recognition. Let’s set up our index.html file and include our front-end JavaScript file (script.js) and Socket.IO, which we will use later to enable the real-time communication:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width">
<title>Recognito</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<link rel="stylesheet" type="text/css" href="css/style.css">
</head>
<body>
<section>
<h1>Recognito</h1>
<button id="btn"><i class="fa fa-microphone"></i></button>
<div>
<p>You said: <em class="output-you">...</em></p>
<p>Recognito replied: <em class="output-bot">...</em></p>
</div>
</section>
<script src="socket.io/socket.io.js"></script>
<script src="js/script.js"></script>
</body>
</html>
To style the button , refer to the style.css file in the source code.
In script.js, invoke an instance of SpeechRecognition, the controller interface of the Web Speech API for voice recognition:
We’re including both prefixed and non-prefixed objects, because Chrome currently supports the API with prefixed properties.
Also, we are using some of ECMAScript 6 syntax in this tutorial, because the syntax, including the const and arrow functions, are available in browsers that support both Speech API interfaces,
Speech Recognition
and SpeechSynthesis.
Optionally, you can set varieties of properties to customize speech recognition:
recognition.lang = 'en-US';
recognition.interimResults = false;
Then, capture the DOM reference for the button UI, and listen for the click event to initiate speech recognition.
document.querySelector('button').addEventListener('click', () => {
recognition.start();
});
Once speech recognition has started, use the result event to retrieve what was said as text. This will return a SpeechRecognitionResultList object containing the result, and you can retrieve the text in the array. Also, as you can see in the code sample, this will return confidence for the transcription, too.
recognition.addEventListener('result', (e) => {
let last = e.results.length - 1;
let text = e.results[last][0].transcript;
console.log('Confidence: ' + e.results[0][0].confidence);
// We will use the Socket.IO here later…
});
Socket.IO is a library for real-time web applications. It enables real-time bidirectional communication between web clients and servers. We are going to use it to pass the result from the browser to the Node.js code, and then pass the response back to the browser.
You may be wondering why are we not using simple HTTP or AJAX instead. You could send data to the server via POST. However, we are using WebSocket via Socket.IO because sockets are the best solution for bidirectional communication, especially when pushing an event from the server to the browser. With a continuous socket connection, we won’t need to reload the browser or keep sending an AJAX request at a frequent interval.
Instantiate Socket.IO in script.js somewhere:
const socket = io();
Then, insert this code where you are listening to the result event from SpeechRecognition:
socket.emit('chat message', text);
Now, let’s go back to the Node.js code to receive this text and use AI to reply to the user.
To build a quick conversational interface, we will use API.AI because it provides a free developer account and allows us to set up a small-talk system quickly using its web interface and Node.js library.
Use this for reference:
var APIAI_TOKEN =apiai("5afc4bdf601046b39972ff3866cca392");
const APIAI_SESSION_ID = "chatbot-clvxfh";
or get your own by visiting the official site(Getting Started)and signing up.
Now we will use the server-side Socket.IO to receive the result from the browser.
io.on('connection', function(socket) {
socket.on('chat message', (text) => {
// Get a reply from API.AI
let apiaiReq = apiai.textRequest(text, {
sessionId: APIAI_SESSION_ID
});
apiaiReq.on('response', (response) => {
let aiText = response.result.fulfillment.speech;
socket.emit('bot reply', aiText); // Send the result back to the browser!
});
apiaiReq.on('error', (error) => {
console.log(error);
});
apiaiReq.end();
});
});
Once the connection is established and the message is received, use the API.AI APIs to retrieve a reply to the user’s message.When API.AI returns the result, use Socket.IO’s socket.emit() to send it back to the browser.
Create a function to generate a synthetic voice. This time, we are using the SpeechSynthesis controller interface of the Web Speech API.
The function takes a string as an argument and enables the browser to speak the text:
function synthVoice(text) {
const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance();
utterance.text = text;
synth.speak(utterance);
}
In the function, first, create a reference to the API entry point, window.speechSynthesis. You might notice that there is no prefixed property this time: This API is more widely supported than SpeechRecognition, and all browsers that support it have already dropped the prefix for SpeechSysthesis.
Then, create a new SpeechSynthesisUtterance() instance using its constructor, and set the text that will be synthesised when the utterance is spoken. You can set other properties, such as voice to choose the type of the voices that the browser and operating system should support.
Finally, use the SpeechSynthesis.speak() to let it speak!
Now, get the response from the server using Socket.IO again. Once the message is received, call the function.
'use strict';
const socket = io();
const outputYou = document.querySelector('.output-you');
const outputBot = document.querySelector('.output-bot');
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
document.querySelector('button').addEventListener('click', () => {
recognition.start();
});
recognition.addEventListener('speechstart', () => {
console.log('Speech has been detected.');
});
recognition.addEventListener('result', (e) => {
console.log('Result has been detected.');
let last = e.results.length - 1;
let text = e.results[last][0].transcript;
outputYou.textContent = text;
console.log('Confidence: ' + e.results[0][0].confidence);
socket.emit('chat message', text);
});
recognition.addEventListener('speechend', () => {
recognition.stop();
});
recognition.addEventListener('error', (e) => {
outputBot.textContent = 'Error: ' + e.error;
});
function synthVoice(text) {
const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance();
utterance.text = text;
synth.speak(utterance);
}
socket.on('bot reply', function(replyText) {
synthVoice(replyText);
if(replyText == '') replyText = '(No answer...)';
outputBot.textContent = replyText;
});
It's done.Run the following command in your terminal.
$ node index.js
And search
localhost:3000
in any supported browser.You can refer to my repository for further help.