Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2018
In this thesis we describe novel computer vision approaches to observe and learn activities from human demonstration videos. We specifically focus on using first-person and close up videos for learning new activities, rather than traditional third-person videos that have static and global fields of view. Since the specific objective of these studies is to build intelligent agents that can interact with people, these types of videos are beneficial for understanding human movements, because first-person and close up videos are generally goal-oriented and have similar viewpoints as those of intelligent agents. We present new Convolutional Neural Network (CNN) based approaches to learn the spatial/temporal structure of the demonstrated human actions, and use the learned structure and models to analyze human behaviors in new videos. We then demonstrate intelligent systems based on the proposed approaches in two contexts: (i) collaborative robot systems to assist users with daily tasks, and (ii) an educational scenario in which a system gives feedback on their movements. Finally, we experimentally evaluate our approach in enabling intelligent systems to observe and learn from human demonstration videos.