A new software program has been developed that can lip-read as good as humans. Developed by scientists at the Oxford University this breakthrough could give people with hearing disabilities a little hope.
Known as ‘Watch, Attend and Spell’ (WAS), the AI software has been developed in collaboration with Google’s DeepMind. The system was trained by running it on thousands of hours of BBC news programs.
The system is said to use computer vision and machine learning methods wherein it learns how to lip-read from a dataset included more than 5,000 hours of TV footage from the BBC. This meant more than 118,000 sentences and a vocabulary of 17,500 words spoken by more than 1,000 different people.
WAS was also tested by putting it against an expert lip-reading human. The test was simple, try to guess what was being said in a silent video using only the person’s mouth movements. The results were quite surprising as the AI system correctly lip-read 50 percent of silent speech and the human could only guess 12 percent. The test also concluded that the machine made small mistakes like missing an “s” at the end of a word, or single letter misspellings.
WAS Joon Son Chung, lead-author of the study said, “It is great to see research being conducted in this area, with new breakthroughs welcomed by Action on Hearing Loss by improving accessibility for people with a hearing loss. AI lip-reading technology would be able to enhance the accuracy and speed of speech-to-text especially in noisy environments and we encourage further research in this area and look forward to seeing new advances being made. Lip-reading is an impressive and challenging skill, so WAS can hopefully offer support to this task – for example, suggesting hypotheses for professional lip readers to verify using their expertise. There are also a host of other applications, such as dictating instructions to a phone in a noisy environment, dubbing archival silent films, resolving multi-talker simultaneous speech and improving the performance of automated speech recognition in general.”