Aayush Bansal (Carnegie Mellon University)
Tuesday, 19.5.2020, 11:30
Zoom Lecture: https://technion.zoom.us/j/92570206785
Licklider and Taylor (1968) envisioned computational machinery that could enable better communication between humans than face-to-face. In the last fifty years, we have used computing to develop various means of communication, such as mails, messaging, video conversation, or virtual reality. These are, however, a proxy of face-to-face communication that aims at encoding words, expressions, emotions, and body language at source and try to decode them reliably at the destination. The true potential of computing to improve social communication has not been realized. A computational machinery that can understand and create a four-dimensional audio-visual world can enable humans to describe their imagination and share it with others. In this talk, I will introduce the Computational Studio that allows average humans to densely construct the 4D audio-visual world from sparse signals. Thereby enabling everyone to relive old memories like a virtual time travel, automatically create new experiences, and share them with others using everyday computational devices.
There are three essential components of the Computational Studio: (1) How to capture 4D audio-visual world?; (2) How to synthesize the audio-visual world using examples?; and (3) How can a user interactively synthesize the audio-visual world? The first part of this talk introduces the work on capturing and browsing in-the-wild 4D audio-visual world in a self-supervised manner and efforts on building a multi-robot capture system. The applications of this work transcend virtualized reality to digitizing intangible cultural heritage, capturing tribal dances and wildlife in the natural environment, and understanding the social behavior of human beings. In the second part, I will talk about the example-based audio-visual synthesis in an unsupervised manner that allows us to express ourselves easily. Finally, I will talk about the work on interactive visual synthesis and extracting scene cues for enabling humans to express themselves. Here I will stress the importance of thinking about human users and computational devices when designing content creation applications.
Once limited to the human mind, the imagination that required the crutches of words and expressions has the potential to be audio-visually communicated to others using the Computational Studio.
Aayush Bansal is a Ph.D. candidate at the Robotics Institute of Carnegie Mellon University, where he is advised by Prof Deva Ramanan and Prof Yaser Sheikh. He is a Presidential Fellow at CMU, and a recipient of Uber Presidential Fellowship (2016-17), Qualcomm Fellowship (2017-18), and Snap Fellowship (2019-20). Various national and international media such as NBC, CBS, France TV, and The Journalist have extensively covered his work. For more details, see http://www.cs.cmu.edu/~aayushb/ (http://www.cs.cmu.edu/~aayushb/).