Google announces AVA, a finely labeled video dataset for human action understanding

Google's AVA consist of 57,000 video segments pulled from YouTube's public statement with 96,000 labeled humans and total labels of 210,000.

Google has announced new labelled datasets of human actions taking place in a video known as the AVA which is an acronym for atomic visual actions. Properly labelled datasets are essential for video detectors, autonomous cars, security systems and other machines to understand and learn about human actions in videos.

Google announces AVA, a finely labeled video dataset for human action understanding

Google

According to a report by TechCrunch, Google's AVA adds more details to complex scenes by offering multiple labels in relevant scenes. Google has explained in a blog post that human actions are difficult to classify as the entirety of the action unfolds over several frames. For example, a single frame of someone running could also look like someone jumping. The entire picture unfolds only after adding a few frames. Complexity will increase as more objects are added to the scene.

Google's AVA consist of 57,000 video segments pulled from YouTube's public statement with 96,000 labelled humans and total labels of 210,000. Each video is three seconds long and is labelled manually using a list of 80 action types which included walking, running, swimming etc. Compared with other action datasets, AVA has three key characteristics called as Person-centric annotation, Atomic visual actions and realistic video material.

image1

Earlier Google had tried explaining AVA in a paper published back in May on arxiv.org. The paper was later updated in July after it was shown that Google's data set was difficult for existing classification techniques.

Tech2 is now on WhatsApp. For all the buzz on the latest tech and science, sign up for our WhatsApp services. Just go to Tech2.com/Whatsapp and hit the Subscribe button.





Top Stories


also see

science