The robotic manipulation of deformable objects, like clothes and fabric, is known as a complex task from both the perception and planning perspectives. Indeed, the stochastic nature of the underlying environment dynamics makes it an interesting research field for statistical learning approaches and neural policies.
In this work, we introduce a novel attention-based neural architecture capable of solving a smoothing task for such objects by the mean of a single robotic arm. To train our network, we leverage an heuristic policy, executed in simulation, which uses the topological description of a mesh of points for representing the object to smooth. In a second step, we transfer the resulting behavior to the real world with imitation learning but using the cloth point cloud as decision support instead capture from a single RGBD camera placed egocentrically on the wrist of the arm. This approach allows fast training of the real-world manipulation network while not requiring scene reconstruction at test time, but solely a point cloud acquired from a single RGB-D camera. Our resulting policy first predicts the desired point to choose from the given point cloud and then the correct displacement to achieve a smoothed cloth.
Experimentally, we first assess our results in a simulation environment by comparing them with an existing heuristic policy, as well as several baseline attention architectures. Then, we validate the performance of our approach in a real-world scenario.