Large Language Models (LLMs) for Robotics

Large Language Models (LLMs) are advanced AI models built using neural architectures like transformers and trained on enormous amounts of text data to predict subsequent words from a given prompt. These general-purpose language generators exhibit several key strengths: they possess extensive world knowledge, including common sense; can learn new tasks contextually; demonstrate reasoning and planning capabilities; support communication in multiple languages, including programming languages; use tools such as data retrieval and API calls; follow instructions accurately if properly fine-tuned and provide interpretable natural language outputs. These attributes make LLMs highly versatile and powerful for a wide range of applications. Visual Language Models (VLMs) extend these capabilities with visual understanding, making it possible to ground the textual knowledge encoded in LLMs and bringing them closer to the physical world. VLMs allow multimodal reasoning which leads to multiple interesting applications: guiding image generation with language instructions (1,2), providing language-based explainability of complex visual scenes (3,4) and leveraging the reasoning capabilities of LLMs for the development of new robotics skills.

We’re using our expertise in LLMs to specifically support the development and deployment of robotic services in large organizations and buildings. LLMs provide a valuable opportunity to enhance access to robotic services through dedicated chatbots. We also believe they can accelerate the creation of new missions—ideally using natural language—to bridge the gap between end-users and robotic hardware. However, LLMs often lack robustness and can produce inaccurate or harmful information, posing a significant challenge in robotic applications where reliability is crucial. To address this, we’e enhancing retrieval-augmented generation (RAG) to improve contextual accuracy. This approach also reduces costs, as smaller models with better retrieval capabilities can achieve performance comparable to much larger ones (5). We’re also addressing the lack of quality guarantees (6,7,8,9) in LLM outputs which can harm user trust – especially important for scaling robot-assisted services. Finally, since LLMs can hold outdated knowledge, we’re implementing scalable and sustainable processes to keep them continuously updated, ensuring they always provide relevant and accurate information (10,11).