Professor Cynthia Matuszek has received a research award from the National Science Foundation to improve human-robot interactions by enabling them to understand the world from natural language in order to take instructions and learn about their environment naturally and intuitively. The two-year award, Joint Models of Language and Context for Robotic Language Acquisition, will support Dr. Matuszsek’s Interactive Robotics and Language Lab, which focuses on how robots can flexibly learn from interactions with people and environments.

As robots become smaller, less expensive, and more capable, they are able to perform an increasing variety of tasks, leading to revolutionary improvements in domains such as automobile safety and manufacturing. However, their inflexibility makes them hard to deploy in human-centric environments, such as homes and schools, where their tasks and environments are constantly changing. Meanwhile, learning to understand language about the physical world is a growing research area in both robotics and natural language processing. The core problem her research addresses is how the meanings of words are grounded in the noisy, perceptual world in which a robot operates.

The ability for robots to follow spoken or written directions reduces the adoption barrier for robots in domains such as assistive technology, education, and caretaking, where interactions with non-specialists are crucial. Such robots have the potential to ultimately improve autonomy and independence for populations such as aging-in-place elders; for example, a manipulator arm that can learn from a user’s explanation how to handle food or open novel containers would directly affect the independence of persons with dexterity concerns such as advanced arthritis.

Matuszek’s research will investigate how linguistic and perceptual models can be expanded during interaction, allowing robots to understand novel language about unanticipated domains. In particular, the focus is on developing new learning approaches that correctly induce joint models of language and perception, building data-driven language models that add new semantic representations over time. The work will combines semantic parser learning, which provides a distribution over possible interpretations of language, with perceptual representations of the underlying world. New concepts will be added on the fly as new words and new perceptual data are encountered, and a semantically meaningful model can be trained by maximizing the expected likelihood of language and visual components. This integrated approach allows for effective model updates with no explicit labeling of words or percepts. This approach will be combined with experiments on improving learning efficiency by incorporating active learning, leveraging a robot’s ability to ask questions about objects in the world.