Warehouse Automation

View Original

Google Releases ‘Objectron Dataset’ of Object

Google AI yesterday released its Objectron dataset — a collection of short, object-centric video clips capturing a large set of common objects from different angles. Each video clip is accompanied by AR session metadata that includes both camera poses and sparse point-clouds.

The Google researchers hope the dataset’s release will help the research community push the limits of 3D object-geometry understanding, which has the potential to power a wide range of applications such as augmented reality, robotics, autonomy, and image retrieval.

Google AI researchers earlier this year released their MediaPipe Objectron, a mobile real-time 3D object detection pipeline able to detect everyday objects in plentiful 2D image collections and estimate their poses and sizes through a machine learning (ML) model trained on a newly created 3D dataset.

Understanding objects in 3D remains challenging in large part due to the lack of large, real-world 3D datasets. The Google researchers believe the ML community has a strong need for object-centric video datasets that capture more of the 3D structure of an object while matching the data format used for many vision tasks, and so decided to release the Objectron dataset to aid in the training and benchmarking of ML models.

The Objectron dataset contains manually annotated 3D bounding boxes that describe each object’s position, orientation, and dimensions, comprising 15,000 annotated video clips supplemented with over 4 million annotated images collected from a geo-diverse sample covering 10 countries across five continents.

The dataset currently includes bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes, and is stored in the Objectron bucket on Google Cloud storage. An open-sourced data pipeline has been provided to parse the dataset in Tensorflow, PyTorch, and Jax ML frameworks.

Along with the dataset, the researchers also shared a 3D object detection solution for the shoes, chairs, mugs and cameras categories. The models are trained with the Objectron dataset and have been released in MediaPipe, Google’s open-source framework for cross-platform customizable ML solutions for live and streaming media.