Large Language Models (LLMs) for Robotics

Robot Delivery. LLM for robotics

Large Language Models (LLMs) are advanced AI models built using neural architectures like transformers and trained on enormous amounts of text data to predict subsequent words from a given prompt. These general-purpose language generators exhibit several key strengths: they possess extensive world knowledge, including common sense; can learn new tasks contextually; demonstrate reasoning and planning capabilities; support communication in multiple languages, including programming languages; use tools such as data retrieval and API calls; follow instructions accurately if properly fine-tuned and provide interpretable natural language outputs. These attributes make LLMs highly versatile and powerful for a wide range of applications.

Visual Language Models (VLMs) extend these capabilities with visual understanding, making it possible to ground the textual knowledge encoded in LLMs and bringing them closer to the physical world. VLMs allow multimodal reasoning which leads to multiple interesting applications and we’ve worked on guiding image generation with language instructions (1,2), providing language-based explainability of complex visual scenes (3,4) and leveraging the reasoning capabilities of LLMs for the development of new robotics skills (14).

Robot services image. NAVER 1784 building

We’re using our expertise in LLMs to specifically support the development and deployment of robotic services in large organizations and buildings such as enhancing access to the services through dedicated chatbots or agents. We believe LLMs can accelerate the creation of new missions, ideally using natural language, to bridge the gap between end-users and robotic hardware. However, because LLMs often lack robustness and may produce inaccurate or harmful information, they pose a significant challenge in robotic applications where reliability is crucial. The key scientific challenges we’re currently addressing in relation to this are summarised below.

LLMs for Robotics_ scientific challenges

To improve contextual accuracy, we’ve been enhancing retrieval-augmented generation (RAG) . This approach also reduces costs, as smaller models with better retrieval capabilities can achieve performance comparable to much larger ones (5,6,13).

We address the lack of quality guarantees in LLM outputs which can undermine user trust, especially critical when deploying robot-assisted services (7,8). We also tackle a practical yet challenging setting where only a small number of preference annotations need be collected per user to align at the level of the user. This is a problem we define as Personalized Preference Alignment (9,10).

Recently we’ve been investigating LLMs for reasoning by exploring concerns around standard tuning methods like Reinforcement Learning and their potential impact on response diversity (11). Maintaining diversity is crucial for solving complex reasoning tasks. Additionally, we explore LLM-powered chatbots from an HCI perspective, including tools to support alignment and evaluation in deployment contexts (12).

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

blank

Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.

blank