Recent multi-teacher distillation methods have successfully unified the encoders of several foundation models into a single encoder capable of competitive performance on core computer vision tasks, such as classification, segmentation, and depth estimation. This led us to ask: Could similar success be achieved when the pool of teachers also includes vision models specialized in diverse tasks across 2D and 3D perception? In this paper, we define and investigate the problem of heterogeneous teacher distillation, or co-distillation — a challenging multi-teacher distillation scenario where teacher models vary significantly in both (a) their design objectives and (b) the data they were trained on. We explore strategies for data sharing and encoding teacher-specific information and as a result, we obtain a single encoder that excels in challenging tasks spanning 3D understanding, 3D human perception, and 2D vision. The resulting model exhibits strong generalization capabilities and performs on par with its teachers, each one state-of-the-art for a specialized task. Notably, our model outperforms all known methods on the Map-free Visual Relocalization dataset with a highly compact encoder.