Facebook AI is getting more like humans at generating captions for images

One of the applications of image captioning is making photo and video content available to visually challenged audiences.


Facebook is investing in long term research of various AI applications. One of the research areas is to allow computers to understand images, and automatically generate captions for images. Facebook is training AI to classify, understand and explain the elements in a photo. In the last few years, computers have gone from automatically drawing blocks around objects in a scene, to an implementation that works out who or what the objects in an image are, and what they are doing.

 Facebook AI is getting more like humans at generating captions for images

Facebook can predict future movements, describe the position and describe the image in almost human terms. The wireframe overlays on the image allows the AI to figure out the poses of the people. A comparison of the poses of the people in the image allows the machine to better describe the action. The best thing is, Facebook can do all of this in real time. Here is a video of a Facebook testing the pose identification. The system tracks the body through various situations and poses, and does not get thrown off no matter how hard it is pushed.

The system is not fool proof though. There are cases when Facebook AI does not come up with an appropriate caption. Machines "see" images very differently from humans, and something that is trivially easy for humans may be very difficult for machines. Facebook engineers are understanding how humans learn, and are integrating these lessons into artificial intelligences, to make them more human like. This is an example of a caption being accurate, next to a caption that is just wrong.

3_cv_man-surboard-plane-tarmac

The wrong caption is because machines lack a contextual understanding of the world. To help machines get better contextual understanding, Facebook engineers feed in models that mimic conceptual understanding by humans. These are datasets where facts are correlated to concepts. However, all the data in the world is not available as models that can be fed into artificial intelligences, which is why Facebook is teaching the machines to farm the data from unstructured sources, such as Wikipedia articles. The improvements in the field has seen an acceleration in recent times, and more rapid progress can be expected going forward.

The Captionbot by Microsoft. The Bot get better with use

The Captionbot by Microsoft. The Bot get better with use

Image captioning is an important exercise for tech companies to train their artificial intelligences. Microsoft launched a captionbot that provides captions to images. The service is available on the web, and as a bot on Facebook Messenger. It is a hit and miss affair using the bot, but the captions attempt to describe the photo rather than just identify and state what is contained within. At the Made By Google event in October, CEO Sundar Pichai demonstrated TensorFlow captioning images. The captions by machines are approaching the captions provided by humans.

google-ai-ml

The image captioning technology used by Google is open source, and available on GitHub as a model for Tensorflow. Google initiated experiments to automatically caption images way back in 2014. One of the applications of image captioning is making photo and video content available to visually challenged audiences. While surfing the web, a digital assistant can talk to the user and describe the images.

Find latest and upcoming tech gadgets online on Tech2 Gadgets. Get technology news, gadgets reviews & ratings. Popular gadgets including laptop, tablet and mobile specifications, features, prices, comparison.