NEW Try Zapier integration to connect Dasha instantly to thousands of the most popular apps!

Behind the scenes: the early history of voice technologies in 6 short chapters

Photo by Paweł Czerwiński on Unsplash
Photo by Paweł Czerwiński on Unsplash

Voice technology can work miracles these days: today’s digital assistants can be as small as tea cups, but the first voice recognition devices were almost the size of an entire room. Ever wonder how – and when – it all started? 

I got you covered with these six really short chapters that will help you get a better grasp of the history of voice recognition. 

Chapter 1: “Audrey” - The First-Ever Voice Recognition Machine

The first voice recognition machine, Audrey (which actually stood for “automatic digit recognizer”), was built in the 1950’s by Bell Labs. It recognized the numbers from 0 to 9 with 90% accuracy.

Audrey was a big girl: it’s relay rack alone was 6 feet tall; it needed substantial power to run and had streams of cables – all that for just ten digits. But it worked! It recognized ten digits! Albeit... only in the voice of its creator. 

Chapter 2: Genesis of Japanese Voice Recognition Machines

Audrey's success didn't go unnoticed: in the 1960's, Japanese scientists presented their own devices that could recognize:

  • Vowels (Radio Research Lab, Tokyo)

  • Phonemes (Kyoto University): it could understand one hundred Japanese monosyllables 

  • Spoken digits (NEC Laboratories): it recognized results of 99.7% out of 1000 utterances pronounced by 20 male speakers [1]

The vowel recognizer built by Suzuki and Nakata at the Radio Research Lab, Tokyo [1]
The vowel recognizer built by Suzuki and Nakata at the Radio Research Lab, Tokyo [1]

Chapter 3: IBM’s Shoebox

Also in the 1960's, IBM presented the "Shoebox" machine (the name was given to it because of its small size). 

It recognized and responded to 16 spoken English words, including the numbers from 0 to 9. When the speaker uttered an arithmetic problem (said a number and command words such as "plus," "minus" and "total"), Shoebox gave instructions to an adding machine to calculate and print answers. 

Here’s a picture of Dr. E. A. Quade, IBM manager of the advanced technology group demonstrating Shoebox [2]:

Dr. E. A. Quade and the Shoebox
Dr. E. A. Quade and the Shoebox

If you want to see Shoebox in action, check out this Youtube video.

Chapter 4: A Big Leap in the 1970s with Harpy!

Something very important happened in voice recognition history in the 1970s: CMU, together with IBM and Stanford, built Harpy – a device that could understand entire sentences!

It recognized 1,011 words – the vocabulary of an average three-year-old.

The shift from words to phrases was no easy task. Here’s Alexander Waibel, a computer science professor at Carnegie Mellon [3]:

Alexander Waibel on the shift from words to phrases
Alexander Waibel on the shift from words to phrases

Chapter 5: Tangora Marks the Beginning of Modern Speech Recognition

In the 1980's, IBM created Tangora, a typewriter activated by voice and able to handle a vocabulary consisting of 20,000 words.

It could predict the most likely phonemes to follow a particular phoneme. The secret? It used Hidden Markov Models to shift “from simple pattern recognition methods based on templates and a spectral distance measure, to a statistical method for speech processing”. [4] 

Step 6: The ’90s Saw the First Continuous Speech Recognition Tool - NaturallySpeaking

In the 1990's, Dragon Systems presented Dragon NaturallySpeaking – the first continuous speech recognition tool. Which meant no more discrete dictation (you didn’t have to pause… after… every… word… anymore)!

But it’s at a more advanced edition (Version 15) that now leverages advances made in machine learning to achieve a recognition accuracy of 99% as claimed by Nuance, which now partners with IBM for more ground-breaking research into advanced speech recognition.

What the Future Could Hold for Voice Recognition

These are the six steps that led to voice recognition as we know it today. It now comes in all shapes and sizes. You have virtual assistants such as Apple Siri, Cortana, and Alexa that can handle your day-to-day requests. Speech-to-text tools found on text editors (Google Docs, Microsoft Word, etc.) now make it possible to type words on the screen with speech rather than using a physical keyboard - which improves accessibility.

At a more advanced level, we have full-fledged conversational AI that can automate voice communication to produce human-like conversations (what we do here at Dasha AI). By offering conversational AI as a service, we are making it possible for developers to develop automated customer self-service solutions or emblem realistic voice capabilities in their apps.  And the fact that it took only half a century to get this far leaves us wondering what’s going to happen next. Don’t wonder too long though – we will let you know in the next installment – voice tech history in the new millennium

And to recap what we’ve been talking about, here’s a timeline infographic of the history of speech recognition for you:

Early history of voice tech: the timeline
Early history of voice tech: the timeline

References

[1] Adami, Andre. (2020). Automatic Speech Recognition: From the Beginning to the Portuguese Language

[2] IBM Archives 

[3] BBC Future

[4] A brief history of ASR

Related Posts