Neural Combinatorial Optimization for Robot Fleet Management
Providing robotic services often involves the coordination and management of a fleet of robots to optimize how the tasks are assigned, the robots scheduled and the routes planned. These are challenging combinatorial optimization problems due to their large scale, many operational constraints and real-world uncertainties. The same optimization problems also need to be solved several times (e.g. every day, every hour) for a specific service with different inputs. Deep learning-based heuristics exploit the distribution of these inputs to provide tailored heuristics for each use-case. This is really not practical as not only do you need to collect enough data but if the service is even slightly updated (e.g. resulting in some additional constraints) then you need to re-learn a new heuristic from scratch. To circumvent this limitation of task-specific learning, we’re exploring foundation models for combinatorial optimization where the idea is to learn a model on a variety of optimization problems so that it can easily be adapted to new problems.
Towards a Foundation Model for Combinatorial Optimization
Many Combinatorial Optimization (CO) problems have been extensively studied in the Operations Research community resulting in some powerful solutions. Based on these problem-specific solutions we’re working on teaching a neural model to solve a wide range of CO problems. GOAL (Generalist combinatorial Optimization Agent Learning) consists of a single transformer-based backbone and light-weight problem-specific adapters, mostly for input and output processing. It’s the first neural model that is able to solve a variety of CO problems spanning routing, scheduling, packing and graph optimization problems.
GOAL is based on our Bisimulation Quotienting approach for efficient Neural Combinatorial Optimization which is a generic framework to design efficient Markov Decision Processes (MDP) for combinatorial optimization. The particularity of these MDPs is that they exploit the widespread recursive property of CO problems. The intuition is that, for many problems (e.g. all those solvable by dynamic programming), when we start constructing a partial solution, we’re faced with a remaining subproblem of the same nature as the original one. Here’s what the construction process looks like for some classic CO problems.
Multi-robot orchestration
We’re working on the orchestration of robots in NAVER’s latest datacenter, GAK Sejong, the largest in Asia with a capacity of 600,000 server units and one designed to use robots to do a lot of heavy lifting and for general efficiency.
In the warehouse, SeRo robots are responsible for fetching servers from the shelves and loading them onto GaRo robots which transport them to the server rooms. Task requests arrive as tickets, each specifying a set of assets to pick up and load onto a parked GaRo robot. These tickets are generated dynamically throughout the day, creating a stream of interdependent pick-up and delivery tasks for the SeRo robots. We were interested in how the fleet of SeRo robots could navigate collision-free paths to perform their tasks as efficiently as possible in the warehouse. While this problem is a variant of Multi-Agent Path Finding (MAPF) which has been extensively studied, our use-case presents unique challenges. The warehouse is a very constrained space and the SeRo robots are large, making it impossible for two robots to pass each other in most areas. Additionally, due to the heavy weight of the robots and the servers they transport, their precise dynamics must also be taken into account. Planning with the simple constant speed approximation commonly found in the MAPF literature would have resulted in real robot collisions.
To solve this MAPF problem, we’ve proposed an extension to the standard Prioritized Planning algorithm to deal with the inter-dependent tasks, Interleaved Prioritized Planning and a novel Via-Point Star (VP*) algorithm to compute an optimal dynamics-compliant robot trajectory to visit a sequence of goal locations while avoiding moving obstacles. Below is an illustration of our approach in simulation with 10 robots.
After testing our algorithms in the data center, we realized that even in such a controlled setting, with high-fidelity robots, unexpected variations often lead to significant discrepancies between plans and real robot behavior. We’re now focusing on refining our methods to explicitly account for the uncertainty. This will not only strengthen our plans in the data center but also enable us to apply our multi-robot coordination algorithms to a broader range of real-world environments.
Related publications
Multi-agent path finding with real robot dynamics and interdependent tasks for automated warehouses, ECAI 2024
GOAL: a Generalist Combinatorial Optimization Agent Learner, arXiv 2024
Bisimulation Quotienting approach for efficient Neural Combinatorial Optimization, NeurIPS 2023