A voice detector is a device invented or created to detect the sounds that are made when people speak or sing. Computer scientists have been searching ways to enable computer to record, interpret and understand human speech since 1960’s. This has been a dauting task throughout the decades. Even the most rudimentary problem such as sampling voice was a huge challenge in the early years. It took until the 1980s before the first systems arrived which could actually decipher speech (Goel and Singh, 2014). Furthermore, with the expectation of sound handling technique evolution, inventors and engineers invented the first voice recognition system in 1950s which could only recognize digits (Pinola, 2011). “Audrey”, the first voice recognition system in 1952, was able to recognize spoken digits (Warren, 2014). In other words, “Audrey” could only distinguish between ten digits from zero to nine. The IBM Shoebox was the most advanced voice recognition machine because of the ability to understand 16 words spoken in English when it was revealed at the Seattle World’s Fair in 1962 (Kane, 2015). The improvement of voice recognition technology can be seen after 20 years in Harpy system. Harpy is a voice recognition system developed in Carnegie-Mellon University resulted from the performance analysis in various design choices of two earlier speech recognition systems, which are Hearsay-I system and Dragon system (Lowerre, 1976).
According to Pinola (2011), Harpy system could understand 1101 words, approximately the vocabulary that may learn by a three years old child. In 1980s, the Hidden Markov Model (HMM) was the turning point of voice recognition to voice prediction (Gales and Young, 2007). HMM allows the conversion from sound input to words written output accurately by using voice prediction technology. While in 1990s, the first voice recognition product for consumer, Dragon Dictate has been developed. This new product can recognize continuous speech in about 100 words per minute (Pinola, 2011). Moving to late 2000s, Google has introduced a voice recognition software that will serve as a foundation for the company’s future Voice Search product (Huang, Baker, and Reddy, 2014).
Besides that, according to Martins, Trancoso, Abad, and Meinedo (2009), current voice detector technology can recognize the gender identity by detecting the voice. This means that the gender of the speaker can be determined after analysis made on the voice detected. Nevertheless, voice detector is used to detect unusual voice for nursing system purposes (Wilson et al., 2009). Examples of the unusual voice include cough, groan, wheeze, cry and etc. In addition, voice/non-voice (VNV) detection which used for determining the vocal folds activity regions in the speech signal are widely used in speech processing applications such as speech enhancement, speech coding and speech recognition (Kumar and Rao, 2016). With the implementation of Artificial Intelligence (AI) in sound technology, interaction between humans and machines such as computer or smartphones has been allowed nowadays. For instance, Siri in Apple smartphones and iPads, Google search and Window 10 Cortana has allowed the interaction between humans and smart device.
Cite this Essay
To export a reference to this article please select a referencing style below