Visual understanding of human actions

Dr. Hamed Pirsiavash

Postdoctoral Research Associate
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology

12:00-1:00pm Friday, 27 February, 2015, ITE 325B

The aim in computer vision is to develop algorithms for computers to “see” the world as humans do. Central to this goal is understanding human behavior as an intelligent agent functioning in the visual world. For instance, in order for a robot to interact with us, it should understand our actions to produce the proper response. My work explores several directions towards computationally representing and understanding human actions.

In this talk, I will focus on detecting actions and judging their quality. First, I will describe simple grammars for modeling long-scale temporal structure in human actions. Real-world videos are typically composed of multiple action instances, where each instance is itself composed of sub-actions with variable durations and orderings. Our grammar models capture such hierarchical structure while admitting efficient, linear-time parsing algorithms for action detection. The second part of the talk will describe our algorithms for going beyond detecting actions to judging how well they are performed. Our learning-based framework provides feedback to the performer to improve the quality of his/her actions.

Host: Mohamed Younis