Mobile robots need to navigate in crowded environments to provide services to humans. Traditional approaches to crowd-aware navigation decouple people motion prediction from robot motion planning, leading to undesired robot behaviours. Recent deep learning-based methods integrate crowd forecasting in the planner, assuming precise tracking of the agents in the scene. To do this they require expensive LiDAR sensors and tracking algorithms that are complex and brittle. In this work use a two-step approach to first learn a robot navigation policy based on privileged information about exact pedestrian locations available in simulation. A second learning step distills the knowledge acquired by the first network into an adaptation network that uses only narrow field-of-view image data from the robot sensor. While the navigation policy is trained in simulation without any expert supervision such as trajectories computed by a planner, it exhibits state-of-the-art performance on a broad range of dense crowd simulations and real-world experiments.