|Jinyoung Choi, Christopher Dance, Jung-eun Kim, Seulbin Hwang, Kyung-sik Park|
|International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May - 5 June, 2021|
Modern navigation algorithms based on deep reinforcement learning (RL) have proven to be efficient and robust. However, most deep RL algorithms operate in a risk-neutral manner, making no special attempt to shield users from outcomes that may hurt the most, even if such shielding might cause little loss of performance. Furthermore, such algorithms typically make no provisions to ensure safety in the presence of inaccuracies in the models on which they were trained, beyond adding a cost-of-collision and some domain randomization while training, in spite of the formidable complexity of the environments in which they operate. In this paper, we present a novel distributional RL algorithm that not only learns an uncertainty-aware policy, but can also change its risk measure without expensive fine-tuning or retraining. Our method shows superior performance and safety over baselines in partially-observed multi-agent navigation tasks. We also demonstrate that agents trained using our method can adapt their policies to a wide range of risk measures in a zero-shot manner.