Behind the scenes: the early history of voice technologies in 6 short chapters

Photo by Paweł Czerwiński on Unsplash
Photo by Paweł Czerwiński on Unsplash

Voice technology can work miracles these days: today’s digital assistants can be as small as tea cups, but the first voice recognition devices were almost the size of an entire room. Ever wonder how – and when – it all started? 

I got you covered with these six really short chapters that will help you get a better grasp of the evolution of voice recognition. 

Chapter 1. Audrey

The first voice recognition machine, Audrey (which actually stood for “automatic digit recognizer”), was built in the 1950’s by Bell Labs. It recognized the numbers from 0 to 9 with 90% accuracy.

Audrey was a big girl: it’s relay rack alone was 6 feet tall; it needed substantial power to run and had streams of cables – all that for just ten digits. But it worked! It recognized ten digits! Albeit... only in the voice of its creator. 

Chapter 2. The Japanese rivals

Audrey's success didn't go unnoticed: in the 1960's, Japanese scientists presented their own devices that could recognize:

  • Vowels (Radio Research Lab, Tokyo)

  • Phonemes (Kyoto University): it could understand one hundred Japanese monosyllables 

  • Spoken digits (NEC Laboratories): it recognized results of 99.7% out of 1000 utterances pronounced by 20 male speakers [1]

The vowel recognizer built by Suzuki and Nakata at the Radio Research Lab, Tokyo [1]
The vowel recognizer built by Suzuki and Nakata at the Radio Research Lab, Tokyo [1]

Chapter 3. The Shoebox

Also in the 1960's, IBM presented the "Shoebox" machine (the name was given to it because of its small size). 

It recognized and responded to 16 spoken English words, including the numbers from 0 to 9. When the speaker uttered an arithmetic problem (said a number and command words such as "plus," "minus" and "total"), Shoebox gave instructions to an adding machine to calculate and print answers. 

Here’s a picture of Dr. E. A. Quade, IBM manager of the advanced technology group demonstrating Shoebox [2]:

Dr. E. A. Quade and the Shoebox
Dr. E. A. Quade and the Shoebox

If you want to see Shoebox in action, check out this Youtube video.

Chapter 4. Harpy

Something very important happened in the 1970's: CMU, together with IBM and Stanford, built Harpy – a device that could understand entire sentences! 

It recognized 1,011 words – the vocabulary of an average three-year-old.

The shift from words to phrases was no easy task. Here’s Alexander Waibel, a computer science professor at Carnegie Mellon [3]:

Alexander Waibel on the shift from words to phrases
Alexander Waibel on the shift from words to phrases

Chapter 5. Tangora

In the 1980's, IBM created Tangora, a typewriter activated by voice and able to handle a vocabulary consisting of 20,000 words.

It could predict the most likely phonemes to follow a particular phoneme. The secret? It used Hidden Markov Models to shift “from simple pattern recognition methods based on templates and a spectral distance measure, to a statistical method for speech processing”. [4] 

Chapter 6. NaturallySpeaking

In the 1990's, Dragon Systems presented Dragon NaturallySpeaking – the first continuous speech recognition tool. Which meant no more discrete dictation (you didn’t have to pause… after… every… word… anymore)!

The device understood ~100 words per minute. It might come as a surprise, but it’s still used today: doctors in the UK and US use it to document medical records. 

These are the six steps that led to voice recognition as we know it today. It now comes in all shapes and sizes: from a virtual assistant that can handle your day-to-day requests to full-fledged conversational AI that can automate voice communication (like what we do here at Dasha AI). And the fact that it took only half a century to get this far leaves us wondering what’s going to happen next. Don’t wonder too long though – we will let you know in the next installment – voice tech history in the new millennium

And to recap what we’ve been talking about, here’s a timeline infographic for you:

Early history of voice tech: the timeline
Early history of voice tech: the timeline


[1] Adami, Andre. (2020). Automatic Speech Recognition: From the Beginning to the Portuguese Language

[2] IBM Archives 

[3] BBC Future

[4] A brief history of ASR

Related Posts