Speak, Translate and See with this AWS AI Services Chrome Extension

Written by ceyhun.ozgun | Published 2018/08/12
Tech Story Tags: aws | ai | amazon-polly | amazon-translate | amazon-rekognition

TLDRvia the TL;DR App

I have been learning AWS services for more than a year. On the one hand, I have been learning how to use them, on the other hand I have been blogging about how to use them easily. In my blogs, I am trying to find different use cases and thinking different integration possibilities.

In my previous blogs, I have written about using Lex, Polly and Rekognition within web applications. And in my last post, I have developed a Scratch extension for integrating Polly and Translate with Scratch to provide an easy way for teaching AI to our kids.

In this post, I will show how we can develop a Google Chrome extension to use AWS AI services in Google Chrome. With the extension, you can use Polly to read aloud the selected text, translate the selection using Amazon Translate and detect text in images using Amazon Rekognition.

You can view a demo of the extension below.

Google Chrome Extensions

Google Chrome extensions are specially packaged web applications that provide additional functionality to the Google Chrome browser. They are hosted on Chrome Web Store. You can search extensions on the store and install to your Google Chrome browser with one click. You can find more information about the architecture of extensions here.

Our Extension

You can install the extension using Chrome Web Store from here. You can find the source of the extension here.

Our extension consists of four parts. Manifest file, background script, content script and popup page.

Manifest file describes the extension. It defines permissions, the scripts that will be loaded as background and content scripts and browser actions that shows popup pages.

Background script is the main part that calls AWS services. Background script is loaded only once and accepts messages from content scripts that are loaded onto the pages.

The content script is loaded onto the pages and access page elements like text fields and images. Also, it shows the dialogs for accepting input and showing the output. It sends messages to the background script to invoke AWS services.

The popup page is used for configuring the AWS credentials. Once credentials are taken, they are sent to the background script for storing them in the storage that is specific to the extension.

The Manifest File

manifest.json file is the manifest file that describes the Chrome extension. It lists the scripts that will be loaded as background and content scripts and permissions needed. Also, the popup page is defined in this file.

The Background Script

This script provides the backbone for the extension. It provides access to AWS services and to the storage. Also, it creates context menus for the pages.

For accessing AWS services, we use AWS JavaScript SDK for the Browser. The AWS credentials are taken from the user by the popup page and stored in the storage that is specific to the extension. This storage provides capabilities like localStorage API.

The context menus are created when the extension is loaded. Polly and Translate services are used with the selection, so their commands are created for the selection context.

Authy token read command is created for the editable elements like input and textarea.

And finally, detect text command is created for the image elements.

These context menus are shown in the pages that take part in the tabs. When the user selects a menu item, onClicked handler is called in the background script as shown below. This handler sends the corresponding message to the active tab for showing the dialogs in the page.

The Content Script

While the background script is loaded only once, the content script is loaded multiple times for each tab. The content script provides a bridge between the background script and the page elements of the page they are loaded onto.

Once loaded onto the page, it registers “contexmenu” handler for storing the clicked element.

Once a menu item selected, the background script sends a message and onMessage handler is executed. The handler check the status of the extension and call processContextMenu function which handles the messages. Each command shows its dialog for taking input from the user and showing the output.

For example, read selection command that reads aloud the selected text using Polly is coded like below.

The Popup Page

The popup page is shown when the button in the Chrome toolbar is clicked. This button is declared in the manifest file. The popup page is used to configure AWS credentials. It takes the credentials and the AWS region and saves the credentials to use later by sending a message to the background script. The popup page is declared in popup.html file and looks like below.

Summary

In this post, I have shown how to develop a Chrome extension for using AWS AI services in the Google Chrome browser. Using the extension you can read the selected text in the pages with Amazon Polly, translate the text using Amazon Translate and detect the text in images using Amazon Rekognition.

In the future, I am planning to add celebrity recognition and object and scene detection features of Amazon Rekognition, so please stay tuned if you are interested.

You can find the code here. Also, you can find detailed information about using the extension here. You can install the extension from here.

If you liked this post, you might like my previous post about using AWS AI services in Scratch:

AI Is Hard? It’s Child’s Play With This AWS AI Services Scratch Extension_In my previous post, I have mentioned that we should prepare our children for the future. In the future, to succeed our…_hackernoon.com

Please clap, follow and share if you liked this post.


Published by HackerNoon on 2018/08/12