December 5, 2019

Voice is the next frontier.

I write this text in 2019, and it's almost 2020. Growing up in the 80s, I looked forward to the year 2000 and how things would be different in a future dominated by robots. And yet during those years, the year 2000 was just 19, 18 or 15 years away.
Now we are 20 years past the year 2000, and technology evolved in a way that each person turned into an island, a robot like figure. An android.
The granularity in society is so small today, mainly thanks to the way smartphones have concentrated so much functionality of our lives. It's like we are extensions of our own phones instead of the other way around. They provide music, video, news, snippets into strangers lives, allow me to sell and buy digital assets, write, talk and see people from the other side of the world, monitor my baby, be warned when to put the bins outside, pay for things, read books, listen to books & podcasts, make and edit videos, photos and write texts like these, store my loyalty cards, pay for car parking based on my location, do an insurance claim, identify myself in a secure way (replacing ID Cards), check in for a flight, order food, buy a train ticket, see if it's going to rain, be guided while driving, make a doctor's appointment, check when my package is arriving through post, ... the list is almost infinite.

Does it even make sense to still call it phones?
With this rectangle of glass and metal you can travel the world and if this device is connected, you can buy an airplane ticket, rent a room in Bologna, Santiago, Brisbane or Lisbon; order food for your wife in Amsterdam while you are sitting on an airport in Seattle; or order a taxi when you got lost walking through Vilnius. Smartphones have been delivering society changing innovation, but are still pretty much treated as edge devices, except for video and photography. In fact, there is not a lot of creation and building happening on smartphones if you exclude social media content and video streaming.

We all know they can do more. Way more. Much more. But do we have the tools today to do it?

So what about the next 20 years? What about the 20s, and 30s of this century? Will "phones" even be the smart device that we would interact more with?

Let's step back and understand the trends:

1) Phones will remain for many years to come, will become more powerful, but the cloud computing systems that back their apps is where the magic will happen. This means, phones will still be a sort of edge device between each person and the internet.

2) Voice interface is growing but we have the problem that language is not the best interface. I mean, some very simple tasks take a lot of wording in most languages. So context will be key for voice.

3) Context. This is where phone and voice interface come as one. If I say the word "tea" whilst in the kitchen part of my house, that could very well be the keyword to put my kettle on. But if I say "tea" inside my car, nothing happens, or an answer could pop up: "you are too far from home so for security reasons the kettle will not be turned on". But if I say "Bach", regardless where I am, a selection of music composed by Johann Sebastian Bach can actually play inside the car, in the kitchen, or on my phone if I am in public transport using my bluetooth earphones.

Whoever wins the voice interface technology battle will be the provider of the (cloud) backend and interface API to make this a reality. Until that happens we have already lots of voice innovations, but they all are trying to mimic human voice communication skills, mainly language. Innovations like Amazon Alexa and Amazon Polly, Google Assistant or even Apple Siri.
Where voice can play a massive role in the future technology interface systems, is if used to transmit instructions in a non human fashion and very much dependant on context.

It's like a married couple, or a pair of very good friends who when faced with a 3rd person speaking something at them, just look at each other and immediately establish a communication and information transmission just with a facial expression and sometimes not even that: that situation in that particular context triggers in both listeners the same thought process or memory of a common experience.

Coded voice communication is what I am talking about and this is what will come once machines learn how to interpret context and even complex thoughts transmitted through voice.

But what is all this good for you may ask? This will be a class of innovation at the abstraction layer that will glue all the things that scientists and technology engineers are working today, at the cognitive systems foundational level. This means that I can go out to walk the dog, or just walk, have a series of ideas and thoughts and very quickly my edge interface (aka "phone") is able not only to capture them, but to act on them, without me having to articulate them in the language of humans.

Does this mean we need a new language for voice to become more ubiquitous as a technology interface? Maybe. Man develops new programming languages every few years to pass on instructions to computers, why not create another one that is able to encompass the same complexity but using voice instead of lines of code?

Who wants to have a go?

No comments:

Post a Comment