Speech recognition


Speech recognition focuses on translating speech from a verbal format to a text one. It is about recognizing words for speech-to-text (STT) transcription, virtual assistants, and other speech user interfaces. 


Speech recognition is an interdisciplinary field spanning computer science and linguistics, and it has a long history. Modern applications include everything from speech-to-text features for consumer products to use in military aircraft.

In the context of accessibility, speech recognition can be used to generate closed captions, use virtual assistants, and facilitate the usage of other speech-controlled interfaces. By eliminating the need to use a keyboard or mouse, it can provide accessibility to those who cannot or must avoid the use of these devices.

IBM released their first speech recognition program “Shoebox” in 1962. This machine had the ability to recognize 16 different words, advancing the initial work from Bell Labs from the 1950s. IBM continued to innovate over the years, launching VoiceType Simply Speaking application in 1996. 


Also known as automatic speech recognition, this refers to a technology’s ability to convert human speech into a text format. Sometimes the term voice recognition is used interchangeably with speech recognition, but the two are actually different.