Microsoft’s speech recognition system reaches human-level accuracy

By Ali Leghari|9 years ago |

Tech company Microsoft’s 25-year long wait is over as the company today has announced that its speech recognition system technology has reached human-level accuracy. Last year Microsoft’s researchers recorded 5.9 percent error rate in the system and now with the recent development, it stands at 5.1 percent error rate.

Microsoft said that it has lowered the error rate by introducing CNN-BLSTM (convolutional neural network combined with bi-directional long-short-term memory) model to its system.

Researchers at Microsoft have also been working on neural net-based acoustic as well as on language models, which resulted in the reduction of error rate. Speech recognition system of Microsoft is used in many of its services such as Cortana, Presentation Translator, and Microsoft Cognitive Services.

The company said in a blog post,

“After our transcription system reached the 5.9 percent word error rate that we had measured for humans, other researchers conducted their own study, employing a more involved multi-transcriber process, which yielded a 5.1 human parity word error rate. This was consistent with prior research that showed that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort. Today, I’m excited to announce that our research team reached that 5.1 percent error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year.”

It would be worth mentioning here that the company last year formed 5000-person Artificial Intelligence and Research group to research in the field of (Artificial Intelligence) AI and to compete with other tech companies which are also researching AI and cloud technology. And now with this recent achievement, one can surely say that Microsoft’s decision of forming a 5000-person Artificial Intelligence and Research group has paid off.