Google AI experiment has you talking to books

Rodiano Bonacci
Aprile 16, 2018

It's not hard to see how the tech could work when applied to a pair of smart glasses, à la Google Glass, and voice-amplifying earbuds, either. Users will also be able to cite from a specific passage without actually knowing which book or author it came from, thus allowing you to search for a book or a passage no matter how abstract it may seem. Researchers first trained the system to recognize the voice of a single individual voice talking, this gave the system a base noise to focus on.

According to the blog post, the researchers developed this model by gathering 100,000 videos of "lectures and talks" on YouTube, extracting almost 2,000 hours worth of segments from those videos featuring unobstructed speech, then mixing that audio to create a "synthetic cocktail party" with artificial background noise added.

In this clip from the Google team, you can watch two comedians compete for attention against each other vocally. And for smart home speakers, hearing distinct words among a crowd can be particularly hard.

Google said that its researchers have managed to overcome the obstacle by developing a deep learning model that takes into account a different type of information: visual input. The combination of the visual element in addition to the audio, as opposed to just audio separation, helps in separating and having clean speech tracks associated with a particular visible speaker in a video.

Google singled out closed-captioning systems as one area where this system could be a boon, but the company says it envisions "a wide range of applications for this technology" and that it is "currently exploring opportunities for incorporating it into various Google products". There are a number of technologies it could use the new AI feature in, like video chats like Hangouts or Duo. The AI tracks a person and their voice even when their face is obscured with a waving hand or a microphone. The feature could help enhance speech in video recording, and it could even lead to camera-linked hearing aids in order to improve quality for hearing-impaired users. This has proved to be a major challenge in the field of speech recognition, which is among the main applications of neural networks today. It could be used in public eavesdropping too.

