Towards Intelligent Agents that Learn by Multimodal Commmunication
Sponsor: Machine Learning, Reasoning and Intelligence Program, Office of Naval Research
Principal Investigator: Kenneth D. Forbus
Project Summary: We propose to explore how to create intelligent agents that learn by multimodal communication, in order to perform commonsense reasoning. People commonly communicate with each other using coordinated modalities, such as sketching and talking, or reading texts illustrated with diagrams. Our AI systems need the same capabilities. We propose to explore how to do fluent multimodal communication in the context of knowledge capture, to support commonsense reasoning. Commonsense reasoning is crucial for intelligent systems because it is part of the shared background assumed in working with human partners, and provides a foundation for future learning. Our hypotheses are: (1) qualitative representations are a crucial part of commonsense knowledge and (2) analogical reasoning and learning provides robustness in reasoning, plus human-like learning of complex relational structures. Unlike deep learning systems, for example, analogical learning systems can handle relational structures such as arguments, proofs, and plans, while learning with orders of magnitude less data. This research should help pave the way for intelligent systems that can interact with, and learn from, people using natural modalities, as well as make progress on understanding the nature of human cognition.
Using the Companion cognitive architecture, we propose to explore the following ideas:
- Hybrid Primal Sketch. Our CogSketch system provides a model of high-level human vision that has been used both to model multiple human visual problem-solving tasks and in deployed sketch-based educational software. We propose to build a hybrid primal sketch processor, which combines CogSketch, off-the-shelf computer vision algorithms, and deep learning recognition systems, to process images, especially diagrams.
- Analogical Learning of Narrative Function. Our prior work on analogical question-answering has led to algorithms that provide competitive performance on several datasets, while being more data-efficient than today’s machine learning systems. In this project we propose to extend these ideas to learning narrative functions, i.e. the higher levels of semantic interpretation that ascribe purpose relative to larger tasks to pieces of text. Building on observations of how people learn to read, we plan to build dialogue models for natural annotation, i.e. ways that trainers can teach systems how to interpret multimodal materials, to bootstrap them in a data-efficient manner.
Selected publications:
- Forbus, K. (2019). Qualitative Representations: How People Reason and Learn about the Continuous World, MIT Press.
- Forbus, K., Chang, M. Ribeiro, D., Hinrichs, T., Crouse, M., & Witbrock, M. (2019). Step Semantics: Representations for State Changes in Natural Language. Proceedings of the Reasoning for Complex Question-Answering Workshop, AAAI 2019, Honololu, HI.
- Chen, K., Rabkina, I., McLure, M., & Forbus, K. (2019) Human-like Sketch Object Recognition via Analogical Learning. AAAI 2019.
- Chen, K., Forbus, K., Gentner, D., Hespos, S. & Anderson, E. (2020) Simulating Infant Visual Learning by Comparison: An Initial Model. In Proceedings of CogSci 2020, Online.
- Chen, K., & Forbus, K. (2021) Visual Relation Detection using Hybrid Analogical Learning. Proceedings of AAAI 2021
- Forbus, K. & Lovett, A. (2021) Same/different in visual reasoning. Current Opinion in Behavioral Sciences,
37 63-68.