tech2 News StaffNov 19, 2014 11:24:45 IST
Can a software accurately spell out what a photo contains, can it give captions that describe the objects in a picture? It might sound like the stuff of science fiction, but Google's Research team claims it is pretty close to cracking this concept.
According to the company's research blog, they have gotten closer to creating "a machine-learning system that can automatically produce captions to accurately describe images the first time it sees them." Google relied on recent advances in computer vision and machine translation, which helped the software to give out accurate captions.
The blog post notes that such a "system could eventually help visually impaired people understand pictures..."
According to this report in the New York Times, researchers at Google and at Stanford University were both separately working on the problem and that in both cases, the software was taught how to describe complex scenes like say "a group of young men playing Frisbee."
Getting a software to write a caption for an image might look simple but it isn't. As the Google research blog points out, getting the accurate caption, "requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another."
So how did Google go about working around this problem: They put together recent computer vision and language models into one system. As the NYT report also adds that the scientists at Google and Stanford put together two types of neural networks, "one focused on recognizing images and the other on human language." Once they combined these two, the software was taught to associate certain pictures with certain words and then shown "unseen images." The report notes that even in the unseen images, "the programs were able to identify objects and actions with roughly double the accuracy of earlier efforts."
Google's paper is available in open-source here. Google has put out several pictures on its blog with the automatic captions beneath them. From Two pizzas on an oven to a biker rider on a dirt road, you can see that the captions are pretty accurate in some cases and can place the objects in the picture around a particular context.
The fact that the automatic captions describe colour, the kind of road is pretty cool. But as we see in other captions, the software has gotten it wrong at times as well. For instance, the chart shows that it could not describe the colour of bike or identify what a school bus is.
While this software might have a long way to go, but the break-through announced sounds quite exciting.
Welcome to Tech2 Innovate, India’s most definitive youth festival celebrating innovation is being held at GMR Grounds, Aerocity Phase 2, on 14th and 15th February 2020. Come and experience an amalgamation of tech, gadgets, automobiles, music, technology, and pop culture along with the who’s who of the online world. Book your tickets now.