Behind the scenes: the early history of voice technologies in 6 short chapters
Dasha Smirnova3 minute read
Voice technology can work miracles these days: today’s digital assistants can be as small as tea cups, but the first voice recognition devices were almost the size of an entire room. Ever wonder how – and when – it all started?
I got you covered with these six really short chapters that will help you get a better grasp of the evolution of voice recognition.
Chapter 1. Audrey
The first voice recognition machine, Audrey (which actually stood for “automatic digit recognizer”), was built in the 1950’s by Bell Labs. It recognized the numbers from 0 to 9 with 90% accuracy.
Audrey was a big girl: it’s relay rack alone was 6 feet tall; it needed substantial power to run and had streams of cables – all that for just ten digits. But it worked! It recognized ten digits! Albeit... only in the voice of its creator.
Chapter 2. The Japanese rivals
Audrey's success didn't go unnoticed: in the 1960's, Japanese scientists presented their own devices that could recognize:
Vowels (Radio Research Lab, Tokyo)
Phonemes (Kyoto University): it could understand one hundred Japanese monosyllables
Spoken digits (NEC Laboratories): it recognized results of 99.7% out of 1000 utterances pronounced by 20 male speakers 
Chapter 3. The Shoebox
Also in the 1960's, IBM presented the "Shoebox" machine (the name was given to it because of its small size).
It recognized and responded to 16 spoken English words, including the numbers from 0 to 9. When the speaker uttered an arithmetic problem (said a number and command words such as "plus," "minus" and "total"), Shoebox gave instructions to an adding machine to calculate and print answers.
Here’s a picture of Dr. E. A. Quade, IBM manager of the advanced technology group demonstrating Shoebox :
If you want to see Shoebox in action, check out this Youtube video.
Chapter 4. Harpy
Something very important happened in the 1970's: CMU, together with IBM and Stanford, built Harpy – a device that could understand entire sentences!
It recognized 1,011 words – the vocabulary of an average three-year-old.
The shift from words to phrases was no easy task. Here’s Alexander Waibel, a computer science professor at Carnegie Mellon :
Chapter 5. Tangora
In the 1980's, IBM created Tangora, a typewriter activated by voice and able to handle a vocabulary consisting of 20,000 words.
It could predict the most likely phonemes to follow a particular phoneme. The secret? It used Hidden Markov Models to shift “from simple pattern recognition methods based on templates and a spectral distance measure, to a statistical method for speech processing”. 
Chapter 6. NaturallySpeaking
In the 1990's, Dragon Systems presented Dragon NaturallySpeaking – the first continuous speech recognition tool. Which meant no more discrete dictation (you didn’t have to pause… after… every… word… anymore)!
The device understood ~100 words per minute. It might come as a surprise, but it’s still used today: doctors in the UK and US use it to document medical records.
These are the six steps that led to voice recognition as we know it today. It now comes in all shapes and sizes: from a virtual assistant that can handle your day-to-day requests to full-fledged conversational AI that can automate voice communication (like what we do here at Dasha AI). And the fact that it took only half a century to get this far leaves us wondering what’s going to happen next. Don’t wonder too long though – we will let you know in the next installment – voice tech history in the new millennium.
And to recap what we’ve been talking about, here’s a timeline infographic for you:
 Adami, Andre. (2020). Automatic Speech Recognition: From the Beginning to the Portuguese Language
 IBM Archives
 BBC Future