Manipulating everyday objects without detailed prior models is still beyond the capabilities of existing robots. This is due to many challenges posed by diverse types of objects: Manipulation requires understanding and accurate model of physical properties of objects such as shape, mass, friction, elasticity, etc. Many objects are deformable, articulated, or even organic with undefined shape (e.g., plants) such that a fixed model is insufficient. On top of this, objects may be difficult to perceive, typically because of cluttered scenarios, or complex lighting and reflectance properties such as specularity or partial transparency. Creating such rich representations of objects is beyond current datasets and benchmarking practices used for grasping and manipulation. In this project we will develop an automated interactive perception pipeline for building such rich digitization.
More specifically, in IPALM, we will develop methods for the automatic digitization of objects and their physical properties by exploratory manipulations. These methods will be used to build a large collection of object models required for realistic grasping and manipulation experiments in robotics. Household objects such as tools, kitchenware, clothes, and food items are not only widely accessible and in focus of many practical applications but also pose great challenges for robot object perception and manipulation in realistic scenarios. We propose to advance the state of the art by including household objects that can be deformable, articulated, interactive, specular or transparent, as well as shapeless such as cloth and food items.
Our methods will learn physical properties essential for perception and grasping simultaneously from different modalities: vision, touch, audio as well as text documents such as online manuals and will include the following properties: 3D model, texture, elasticity, friction, weight, size and grasping techniques for intended use. At the core of our approach is a two-level modeling, where a category level model provides priors for capturing instance level attributes of specific objects. We will exploit online available resources to build prior category level models and a perception-action-learning loop will use the robot’s vision, audio, and touch to model instance level object properties. In return, knowledge acquired from a new instance will be used to improve the category-level knowledge. Our approach will allow us to efficiently create a large database of models for objects of diverse types, which will be suitable for example for training neural network based methods or enhancing existing simulators. We will propose a benchmark and evaluation metrics for object grasping, to enable comparisons of results generated with various robotics platforms on our database.
The main objectives we pursue are commercially relevant robotics technologies, as endorsed by the support letters of several companies. We will pursue our goals with a consortium that brings together 5 world-class academic institutions from 5 EU countries (Imperial College London (UK), University of Bordeaux (France), Institut de Robòtica i informàtica Industrial (Spain), Aalto University (Finland), and the Czech Technical University (Czech Republic), assembling a complementary research team with strong expertise in the acquisition, processing and learning of multimodal information with applications in robotics.