Matthias Gallé |
In case you’re unfamiliar with the Gartner hype cycle, it’s a framework used by the industry analysts to situate technology along the axes of maturity and visibility. The hype is at its highest when visibility peaks at a fairly immature stage characterized by speculation on how the technology will evolve in the years to come. An example of this cycle is the one below where Intelligent Agents are at the peak of inflated expectations. What would be your guess as to the year it was published.
When I came across it in my research my guess was somewhere between 2004 to 2006.
I was therefore pretty surprised to learn that it came out in 1995, a decade earlier than what I had imagined. That’s two years before Microsoft deployed their Office Assistant ‘Clippy’ (the paper clip) which, for those of you old enough to remember, most probably evokes a notion of disgust and/or annoyance as it clumsily followed you around the screen, forever getting in the way and not really of much assistance at all.
Now, if you’ve been following the hype around Conversational Agents (also often called Intelligent Assistants), and the sequence of investments in that field over the last year, you might be inclined to say that the peak of inflated expectations for Intelligent Assistants is closer to 2016-2017 than to 1995.
So why the renewed popularity in something that was all the rage over 20 years ago? After some thought there are two main reasons that lie behind this revival. The first is that, back in the 90’s the large majority of the population was quite simply not used to communicating with text-based chat. Since then, it’s gradually spread from being a relatively niche, private channel with friends to one that’s now being used to communicate with businesses and family as well as still friends and convenient for ordering food, trouble shooting, organizing school activities and seeing how grandma and grandpa are doing.
The second reason for the interest is the number of tools that now exist to allow non-technical experts to very quickly create their own dialogue scenarios. Although the current state of technology doesn’t yet allow you to create totally open-domain dialogues (there’s ongoing competition to do something similar), people have realized that most dialogues where there’s money to be made, can be viewed as simple transactional dialogues. This basically means that, after finding out what your intent is (book a hotel, order a pizza), the agent only has to fill a list of pre-defined slots (name, topping, card number, address, etc.) in order to execute a fulfillment (place an order, call a taxi). The machine learning and natural-language processing (NLP) techniques required to do this can be quickly trained, assuming that your dialogue fits into this constrained type. The list of tools to do this is constantly growing and includes api.ai (Google), wit.ai (Facebook), Alexa skill-set, kitt.ai (Baidu), Watson Conversation (IBM), Line (Naver), Snips[1] and recast.ai etc.
Now, just hold on to this picture of the state-of-the-art in conversational agents for a second while we briefly consider social robots. Contrary to industry robots in manufacturing, which typically function out of the reach of workers for safety reasons, social robots are autonomous robots that interact with people and share the same physical space as them such as the home.
Examples of these robots (such as Knightscope or Savioke) share a common trait: they discourage any natural-language interaction by removing any semblance to a head. Current intents to deploy humanoid robots in a social environment (of which there are many) have had very limited success, mostly because they have to explicitly encode any possible utterance a human could come up with. We’re pretty far removed from that requirement as today most human-robot dialogue systems are generally still at the stage of Interactive Voice Recognition (IVR) where you “Say ‘book’ if you want to book a flight”. These systems have been around for about 40 years.
Now let’s unify the first thread of conversational agents with the second one of social robots. To allow human robots to interact more easily with humans, I’m convinced we need to learn from the success of chatbots and develop toolkits that allow non-NLP experts to easily create, compile and deploy new dialogue scenarios. For this, we need to abstract away from the technical complexity to allow the experts in each domain (vs experts in NLP), to quickly design their new use-cases. Something similar to what’s being done with business processes.
It’s also close to what conversational agent toolkits have done for text-based interaction, but adding all the complexity of prosody and non-verbal behaviour. We’ve started contributing to this direction at NAVER LABS, by proposing a framework that selects natural and contextualized conversational fillers to overcome inter-turn silences and make the interaction more natural [1]. We’ve also started to investigate the possibility of automating the creation of non-verbal behaviour based on an utterance [2]. Although these are just baby-steps, the widespread adoption of dialogues with humanoid robots will take many such baby-steps. Each one should be appropriately packaged and made available to the community for use and experimentation. Only then will we be able to create the breadth of topics for human-robot dialogue that will permit the long-term use of social robots in all sorts of everyday scenarios, with or without a talking head.
——
References
[1] Context-aware selection of multi-modal conversational fillers in human-robot dialogue
Matthias Gallé, Ekaterina Kynev, Nicolas Monet, Christophe Legras
26th IEEE International Symposium on Robot and Human Interactive Communication
[2] BEAT-o-matic: a baseline for learning behavior expressions from utterances
Matthias Gallé, Ankuj Arora
ARMADA workshop at 26th IEEE International Symposium on Robot and Human Interactive Communication
——
Matthias Galle recently gave a Meet Up presentation on Conversational Robots at Station F, the world’s biggest start-up campus in Paris, France.
Learn more about NAVER/LINE presence at Station F.
[1] Naver is an investor of Snips via K-Fund 1 of Korelya Capital.
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.