Apple finds a shortcut to make Siri smarterby@babulous
2,884 reads
2,884 reads

Apple finds a shortcut to make Siri smarter

by SK BabuOctober 10th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Siri and I have a troubled relationship, with miscommunication being the root of our problems. The exotic mix of my Indian accent and rapid speech often results in Siri fumbling to catch what I’m saying. Unlike the advanced microphones on smart speakers, a phone’s microphone only picks up my voice in normal conditions. And Siri really struggles to identify what I say if my voice changes slightly due to factors like loudness, ambient noise or stress.
featured image - Apple finds a shortcut to make Siri smarter
SK Babu HackerNoon profile picture

The tale of how Apple cleverly persuaded me to train Siri

Siri and I have a troubled relationship, with miscommunication being the root of our problems. The exotic mix of my Indian accent and rapid speech often results in Siri fumbling to catch what I’m saying. Unlike the advanced microphones on smart speakers, a phone’s microphone only picks up my voice in normal conditions. And Siri really struggles to identify what I say if my voice changes slightly due to factors like loudness, ambient noise or stress.

This is a vicious circle, with Siri sometimes becoming the source of my stress. In fact, I have observed (in test recordings) the tone of my own voice change, and edge towards hysteria when Siri messes up.

Vocal Vagaries

Take my music app. My dumbed down Siri phrase to play music on my phone is simply, ‘Music.’ However a few days ago while driving in a hurry through traffic, Siri just couldn’t catch what I was saying. The more Siri struggled, the more strained my voice became. Sometimes she didn’t even seem to register I had spoken, prompting me to panic and says the word ‘music’ twice in quick succession. I checked the network in the area later but it was fine so that wasn’t it (see below).

My sanity was saved when I had to stop at a signal, and used the touchscreen to get the music going. However I did manage to get a screen capture of part of my weird back and forth with Siri. As you can see, the word ‘Music’ became, among other things, ‘Hey Siri,’ ‘Is it, Is it,’ ‘Yes he’ and ‘Lucy.’ Eventually, Siri completely lost the plot and wanted to know who I was trying to message.

Lagging behind the frontrunners

Contrast Siri’s woes with Google, which entered the voice recognition arena much later. Google is now far better than Siri in picking up what I say. In fact, I often dictate whole long messages using the ‘mic’ feature in Google’s Gboard keyboard app on my phone. Considering that Siri runs on the same hardware, Google definitely has an edge in its software, as well as an unfair advantage via its access to tons of data on users, which make it easier to learn and predict what its users are talking about.

As for Amazon, none of the existing phones can match the hardware of the new wave of smart speakers sparked off by the Echo’s success. Their multiple microphones are far better in identifying voices, picking up words, and ignoring tonal variations and ambient noise. My old Echo Spot (2nd gen) can catch what I say across the room, identify complicated song names, and what not. That must be terribly demoralising for Apple, considering Amazon entered the field much after Google. And Amazon seems to be on a roll with its new Alexa add-on speaker for cars looking like another winner.

Let the mountain come to Siri

Apple’s solution to this problem is a bit like ‘If Siri can’t understand language, then language must simplify itself so Siri can understand it.” Or put it another way, if Siri finds it difficult to catch what’s being said to it, then the difficulty level of what is being said to Siri must be lowered. Theoretically if Apple can give Siri such an edge, she should be able to keep up with Google and Alexa.

But how can you simplify language? Or more to the point, is it even possible for Apple to get users to dumb down their interactions with Siri?

Sound Sense

Let’s do a simple experiment. I ask Siri and Google the same question, “What happens when sodium is put in water?” They both get the right answer (the same one) but I think there’s a difference.

My hypothesis is Siri is correctly identifying the sounds/words in my sentence and pointing me to videos and articles with the same words in the same sequence. Google does the same, but I think it goes one step further. It actually listens to words and sentences, identifies them, figures out the context, and gives me a verbal reply that makes sense. In other words, both Siri and Google hear sounds/words, but only one is making sense of those sounds.

I’m most probably guilty of oversimplifying things, but I think those two answers gives a rough idea of the edge that Google has over Siri. And if I’m right, that’s huge advantage.

Removing sense from the equation

Now let’s do another experiment. I say the words, ‘Run DJ’ to both Siri and Google. Now ‘Run DJ’ doesn’t mean anything. It’s just two random words that come together without making sense. In short, it’s just a couple of sounds. Google figures that out, and points me to some videos with Run and DJ (see screen capture on the right below).

But Siri seemingly makes sense of this nonsensical term. It understands that ‘Run’ means I want to go running and ‘DJ’ means I want to listen to music while running. So when Siri hears me say ‘Run DJ,’ she opens up my running app, and then opens up my music app and plays music for me three screen captures below, starting from far left).

If we didn’t know better, we would say Siri is a genius.

When Siri gets trained

Think of a dog that’s been trained to pee when it hears its master say the word ‘pee.’ The dog doesn’t understand the meaning of the word, ‘pee.’ But it has been trained to understand that it’s expected to pee when it hears that command. The dog is responding to a sound, not a word. Now if an Arab were to say ‘pee’, most probably that dog wouldn’t pee. Why? Because Arabic doesn’t have the letter ‘p.’ Arabs replace ‘p’ with ‘b,’ and say ‘bee’ when they mean ‘pee.’ So if that dog really needs to pee, he’s going to be a bit puzzled. I can see him thinking, “What’s bee? Do you mean pee? Should I cock my leg? Or not? Bow-wow, human, bow-wow.” (That reminds me of my Arab driving instructor who once asked me to ‘bark here.’ Seriously.)

Ok, that analogy may be a bit off. But Siri did something similar, when it heard ‘Run DJ.’ Here’s the real story behind that ‘Run DJ’ phrase.

I used Apple’s new iOS 12 app, Shortcuts, to create a shortcut that launches my running app and my music app with one tap of a button in my Shortcut widget (How I learned to create shortcuts is another story). While creating the shortcut, I noticed there was an option to run the shortcut with a Siri phrase.

So I recorded the phrase, which turned the ‘Shortcut’ into a ‘Siri Shortcut.’

Please note that we are talkings ‘sounds’ here, not ‘words.’ What I did was train Siri to run two particular apps when she hears the sound ‘Run DJ.’ Siri doesn’t really know what ‘Run DJ’ means, and doesn’t give two hoots about it either.

In other words, the mountain has come to Siri.

Resistance to Change

The option to train Siri to recognise your voice has always been there. But unfortunately for Apple, no one seemed to have the time or patience to train Siri. Or maybe they didn’t know how to do it. For instance, Siri has been mispronouncing my wife’s name for years. It sounds so bad that I didn’t think Siri could ever get it right, and never even attempted to train her.

It was only while writing this post that I realised that training Siri wasn’t really a big deal. So I finally sat down and trained Siri on the correct way to pronounce my wife’s name. If a tech savvy guy took so long to train Siri, is it surprising that laymen never got around to it? For them, Siri is more known for her hilarious goof-ups, rather than as a serious voice assistant.

There are other reasons why voice assistants have struggled to catch on in India. Like I said, their pronunciation of Indian names is so atrocious that it straightaway puts off most Indians. Secondly, India is a warm and crowded country. This means your windows are always open, and your neighbours can hear what you say: people end up preferring silent touchscreens to voice. Thirdly, the inability to pick up Indian accents leads to situations where I keep repeating the same words, again and again. This has the instant effect of killing my enthusiasm, besides making me conclude that voice commands are not for me. I think many iPhone users have had a similar experience, and that maybe another reason why Siri has not caught on. Besides, Google and Alexa’s far more advanced voice capabilities don’t really help Siri’s case.

Knocking at voice’s door

However after I got my Amazon Echo Spot and discovered how good it was, I realised voice assistants are viable. Despite that, I’ve just begun to work voice assistants into my life with that Echo, and now Siri Shortcuts, and a voice activated TV on the way. Maybe it’s just that I prefer using touchscreens. Or I am probably suffering a serious case of resistance to change. But deep down, I know I spend too much time on screens, and it would be nice to give my eyes and fingers a break by using voice assistants. In that sense, Apple’s launch of Shortcuts was perfectly timed to catch me.

So like the rest of India who are slowly buying into voice controlled gadgets, I think I too am now slowly becoming more open to voice.

How I warmed to Siri

Unlike voice, everyone loves widgets. Besides it’s so much easier and less stressful to set up than a voice command. So when Apple announced ‘Shortcuts,’ I, like many others, was thrilled. I loved being able to get my running app and music app going with a single tap of a widget. But then I found there was an option to ask Siri to do it for me. This was what eventually became my ‘Run DJ’ Siri Phrase. Unlike Siri’s usual fumbles, she seems to catch ‘Run DJ,’ every time.

It did seem like Siri had finally become ‘smart.’ After many years, my aversion to voice recognition has begun to recede. I tried out a couple of other Siri Phrases combinations with Siri Shortcuts, and they too worked flawlessly. For the first time, I am willing to try using Siri on a regular basis.

Yes, Apple may have cleverly persuaded me to train Siri. She is still nowhere as smart as Google or Alexa. But all that training helps turn iOS into a level playing field for Siri. In fact, Siri may even have a bit of an advantage on the iOS platform as she’s built into it.

Training Siri

My success with the ‘Run DJ’ shortcut has improved my confidence in being able to train Siri. I’m ready to experiment some more. Like I mentioned, Siri’s take on Indian names is sheer torture on my poor ears. Take an Indian name like Babu. Pronounced baa-boo, it’s a common name in India but virtually unknown in the West. So Siri is never going to get it right. She will follow the Western phonetic system, and pronounce it like baboon. Which self-respecting Babu would like to be called a baboon? Really, Siri!

Anyway, I invoke Siri and tell her she’s pronouncing ‘babu’ wrong. At which, she asks me the correct way to pronounce babu. She then listens carefully, and offers up five optional ways in which she can pronounce it. If Siri doesn’t get it right, I tell her to try again till she gets as close as possible. Once I select an option, Siri thanks me for correcting her. Hopefully she permanently stops referring to babu as baboon.

So what makes Siri tick

Siri’s pronunciation of the name ‘babu’ is now close to how an Indian would say it. I’m impressed. Maybe I should do something about Siri’s inability to catch the word, ‘Music.’ Is it because the word music is very similar to a lot of other words/sounds? How about ‘sing’ or ‘song’? I try it with Siri.

Siri starts singing for ‘song,’ but ‘sing’ is bang on. She starts the music app. My guess is Apple has set ‘song’ as a key Siri Phrase for the music app. That would explain how she picked it up. But I also make another interesting observation.

Siri works better with unique sounds

Unlike the word ‘music,’ the word ‘song’ is harder to mis-hear. There are less words that sound like song. Or should I try a combination like Run DJ? That combination of sounds is quite distinctive and hard to mix up for something else. That may be why Siri almost always picks up my ‘Run DJ.’

Come to think of it, you don’t need to even use actual words. You could just try a unique sound. The more distinctive it is, the better. Hmm… let me try changing the sound for my Music shortcut to something really hard to mistake, like say ‘miaow.’ I go into the Music shortcuts’ settings, tap on Siri Phrase, and then tap on Re-record phrase, and say ‘miaow.’ I then test it out with Siri.

It seems to work, with a minor hitch. Siri sometimes misses the first syllable in a Siri phrase. Should I give Siri an extra second after I hold down the home button before I speak? Maybe she needs that extra time to get her act together. Will have to test it out for a couple of days to see how it works.

One thing is for certain. My reservations about using Siri are almost gone. I’m beginning to have fun with Shortcuts and Siri, and finally accepting that she has a role to play in my life. So what if she’s not that into intelligent conversations? I will always have a place in my life for anyone willing to listen to me miaow.

Finally, I don’t know if Shortcuts is changing how iPhone users view Siri, like it happened with me. But if it is, I have two words for Apple.

Well played.