tech2 News StaffOct 20, 2017 18:45:11 IST
Google has announced new labelled datasets of human actions taking place in a video known as the AVA which is an acronym for atomic visual actions. Properly labelled datasets are essential for video detectors, autonomous cars, security systems and other machines to understand and learn about human actions in videos.
According to a report by TechCrunch, Google's AVA adds more details to complex scenes by offering multiple labels in relevant scenes. Google has explained in a blog post that human actions are difficult to classify as the entirety of the action unfolds over several frames. For example, a single frame of someone running could also look like someone jumping. The entire picture unfolds only after adding a few frames. Complexity will increase as more objects are added to the scene.
Google's AVA consist of 57,000 video segments pulled from YouTube's public statement with 96,000 labelled humans and total labels of 210,000. Each video is three seconds long and is labelled manually using a list of 80 action types which included walking, running, swimming etc. Compared with other action datasets, AVA has three key characteristics called as Person-centric annotation, Atomic visual actions and realistic video material.
Earlier Google had tried explaining AVA in a paper published back in May on arxiv.org. The paper was later updated in July after it was shown that Google's data set was difficult for existing classification techniques.