Google develops automatic ‘active speaker’ switch for sign language in video calls

With COVID becoming a part and parcel of our lives, most of the industry has shifted to online platforms for their video conferencing needs. Most of these platforms have a feature that automatically shifts the main feed to the person who is speaking but that is all based on audio. A speech like sign language does not, unfortunately, trigger that feature and this could essentially leave out people with certain disabilities.

Google researchers are trying to curb this gap and they have recently published research that might help with that. In their AI blog, Google explains how their proposed research works to detect sign language with very low latency and how they designed a mechanism to simulate the signing user as the active user.

The model makes use of PoseNet, which estimates the pose of the person reducing the whole image to a basic virtual skeleton. This skeleton is then passed to a LSTM network which achieves an accuracy of about 91.5% with a delay of 3.5ms per frame on the German Sign Language corpus. The system then simulates as if the person who is using sign language is speaking so this software can be coupled with already existing video platforms.

We think this is a much-needed step for any problems that differently-abled people might have faced in their lives in this COVID era. Let us know what you think in the comments below!

Image Source: Rare