518 reads

How to Add a Voice Assistant to your Mobile App?

by Just AIDecember 1st, 2020

Too Long; Didn't Read

How to Add a Voice Assistant to your Mobile App? Just AI First-class tools and techs for developers of all skill levels to build voice-first solutions. Vitbachyov, Just AI solution architect, will show you how to add a voice interface into any app swiftly and seamlessly. In most cases, voice navigation or a conversational form-filling is just enough to use the use of the Habitica example (an open-source Kotlin-based habit tracking app)

Company Mentioned

featured image - How to Add a Voice Assistant to your Mobile App?

Don’t you think that a great many mobile apps would be a lot more convenient if they had voice control? And I don’t mean chatting with a banking bot. In most cases, voice navigation or a conversational form-filling is just enough.

Through the use of the Habitica example (an open-source Kotlin-based habit tracking app) Vit Gorbachyov, Just AI solution architect, will show you how to add a voice interface into any app swiftly and seamlessly. 

How convenient does that sound?

Let’s start with the obvious:

It’s really often that we need to use apps when our hands are full – when we cook, drive, carry a suitcase, etc.
Voice is a major instrument for people with vision disabilities.

It’s really obvious, but most of the time voice is simply quicker. Consider this, ordering a ticket saying get me a plane to London for tomorrow for two instead of a long-time form filling. With an option of clarification questions – should that be morning or evening? Will there be luggage or not?

Voice is very useful in a form-filling scenario and it suits perfectly almost any long forms, requiring lots of info from a user. And these are the kind of forms that almost any mobile app has.

Most companies keep trying to stuff all the functionality they can think of into a chat because usually voice assistants are embedded into chat support. You know, to refill the balance, get the item or service info, etc. It’s not always convenient, sometimes it’s even counterproductive because voice recognition is yet not perfect.

The right approach here is to embed an assistant into the already existing app functionality. So we took Habitica as an example – it’s perfect for a voice assistant addition because in order to create a new task you have to fill in the long form. Let’s try to swap out this dreary process to one phrase with some guiding questions.

What we need to get started SDK. We use Aimybox to create dialog interfaces. It’s an open-source voice assistant SDK with ready to use customizable UI. You can use built-in speech to text and text to speech components and NLU implementations or you can create your own suite.

Aimybox implements the assistant’s architecture, standardizes interfaces, and organizes mutual support. So, you can really cut down the time it takes to develop a voice interface.  

Tools to create a scenario. We will use open-source Kotlin-based chatbot and voice assistant development framework JAICF (Just AI Conversational Framework), which is totally free. Caila (an NLU service) inside the JAICP (Just AI Conversational Platform) would help us with intent recognition.

I’m covering these tools in the next part of this tutorial. There I show how to manage an application with voice only –invoking certain screens, implementing complex queries within the app, and changing habits.

Smartphone. We will need an Android phone to test the Habitica solution.  

Action plan

We start with a project fork – we take the Release development branch and look for the most essential files. I used the IDE Android Studio: 

MainActivity.kt – that’s where we build the logic in.

HabiticaBaseApplication.kt – that’s where we will initiate Aimybox

Activity_main.xml — that’s where the interface element will be   AndroidManifest.xml — that’s where the app’s structure and its permissions are stored

According to Habitica’s repository instruction, we rename habitica.properties.example and habitica.resources.example by taking the example out. Then we start a new firebase project and copy the file google-services.json to the root node.

Now we start the app to see whether it works. Ta-dah!   

To begin with, we add Aimybox dependencies. 

  implementation 'com.justai.aimybox:core:0.11.0'
  implementation("com.justai.aimybox:components:0.1.8")

to dependencies   

   maven { url 'https://dl.bintray.com/aimybox/aimybox-android-sdk/' }
  maven { url "https://dl.bintray.com/aimybox/aimybox-android-assistant/" }

and repositories. 

And right after compileOptions we add the following lone so that everything works properly. 

    kotlinOptions {
        jvmTarget = JavaVersion.VERSION_1_8.toString()
    }

Now the permissions.

We take the flags off in RECORD_AUDIO and MODIFY_AUDIO_SETTINGS in AndroidManifest.xml, so that the options look like that

   <uses-permission android:name="android.permission.READ_PHONE_STATE" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
    <uses-permission android:name="com.android.vending.BILLING" />
    <uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLETED"/>
    <uses-permission android:name="android.permission.RECORD_AUDIO"/>
    <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS"/>

Now we initiate Aimybox in a BaseApplication.  Adding AimyboxProvider at the component initialization

And initializing:  

private fun createAimybox (context: Context): Aimybox {
        val unitId = UUID.randomUUID().toString()
        val textToSpeech = GooglePlatformTextToSpeech(context, Locale("Ru"))
        val speechToText = GooglePlatformSpeechToText(context, Locale("Ru"))
        val dialogApi = AimyboxDialogApi(
                "YOUR KEY", unitId)
        return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
    }

Afterward, instead of YOUR_KEY, you will see your Aimybox Console code.

Now we build in a fragment into mainActivity.kt. We embed FrameLayout in activity_main.xml tentatively, right under the FrameLayout with id bottom_navigation 

<FrameLayout
             android:id="@+id/assistant_container"
             android:layout_width="match_parent"
             android:layout_height="match_parent"/>

Into the MainActivity we add OnCreate explicit request permissions

       ActivityCompat.requestPermissions(this, arrayOf(android.Manifest.permission.RECORD_AUDIO), 1)

And when we get it, we add the fragment into the frame mentioned above. 

   @SuppressLint("MissingPermission")
    override fun onRequestPermissionsResult(
            requestCode: Int,
            permissions: Array<out String>,
            grantResults: IntArray
    ) {
        val fragmentManager = supportFragmentManager
        val fragmentTransaction = fragmentManager.beginTransaction()
        fragmentTransaction.add(R.id.assistant_container, AimyboxAssistantFragment())
        fragmentTransaction.commit()
    }

Don’t forget to add an option to log out the assistant after you’ve logged in into OnBackPressed.  

       val assistantFragment = (supportFragmentManager.findFragmentById(R.id.assistant_container)
                as? AimyboxAssistantFragment)
        if (assistantFragment?.onBackPressed() != true) {
            return
        }

Besides, we add styles to styles.xml in AppTheme

<item name="aimybox_assistantButtonTheme">@style/CustomAssistantButtonTheme</item>
        <item name="aimybox_recognitionTheme">@style/CustomRecognitionWidgetTheme</item>
        <item name="aimybox_responseTheme">@style/CustomResponseWidgetTheme</item>
        <item name="aimybox_imageReplyTheme">@style/CustomImageReplyWidgetTheme</item>
        <item name="aimybox_buttonReplyTheme">@style/CustomButtonReplyWidgetTheme</item>

And some custom styles a bit underneath: 

 <style name="CustomAssistantButtonTheme" parent="DefaultAssistantTheme.AssistantButton">
    </style>
    <style name="CustomRecognitionWidgetTheme" parent="DefaultAssistantTheme.Widget.Recognition">
    </style>
    <style name="CustomResponseWidgetTheme" parent="DefaultAssistantTheme.Widget.Response">
    </style>
    <style name="CustomButtonReplyWidgetTheme" parent="DefaultAssistantTheme.Widget.ButtonReply">
    </style>
    <style name="CustomImageReplyWidgetTheme" parent="DefaultAssistantTheme.Widget.ImageReply">
    </style>

Now let’s see whether the mic has appeared. Launching an application. 

Okay, we got plenty of syntactic errors. We correct everything as IDE says.

Aaaaand it works!

But the mic slipped down the navigation. Let’s give it a lift. Let’s add this to the styles in CustomAssistantButtonTheme:

 <item name="aimybox_buttonMarginBottom">72dp</item>

That’s better!

 Now we switch in the assistant to see whether it responds well. We will need the Aimybox console for that.

We go to app.aimybox.com using Github acc, create a new project, add a couple of skills (I’ve added DateTime for testing purpose), and try asking questions. Using settings in the upper right corner we take apiKey and add it to createAimybox instead of YOUR KEY.  

private fun createAimybox (context: Context): Aimybox {
        val unitId = UUID.randomUUID().toString()
        val textToSpeech = GooglePlatformTextToSpeech(context)
        val speechToText = GooglePlatformSpeechToText(context)
        val dialogApi = AimyboxDialogApi(
                "YOUR KEY", unitId)
        return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
    }