May 2014 — ECE Assistant Professor Dhruv Batra hasearned a National Science Foundation (NSF) Faculty Early Career Development (CAREER) Awardfor his machine perception research in high-level, holistic scene understanding. The CAREERgrant is the NSF's most prestigious award, given to junior faculty members who are expectedto become academic leaders in their fields.
Batra has also been awarded a three-year, $150,000 Young Investigator Program (YIP) awardfrom the Army Research Office to support his research.
Instead of enhancing any single aspect of machine perception (such as face recognition),Batra and his students are taking a fundamentally different approach—they plan to build a"holistic scene understanding" system. Their approach employs a "society ofagents" that develops understanding from the interaction of multiple computer visionmodules. Batra says that he draws inspiration from the work of pioneering AI researchers,such as Minsky, McCarthy, Papert, and Marr from the 1960s and 1970s, who were "simplyahead of their time, and had ideas that needed the computational and statistical tools oftoday (millions of images and thousands of CPUs and GPUs) to be brought to fruition."
Although there are computer vision systems for applications such as face recognition,handwriting recognition, and pedestrian detection, "these systems are inherently naiveand limited in their understanding of an image," Batra says. For example, he notes,"a patch from an image may seem like a face, but may simply be an incidentalarrangement of tree branches and shadows."
"A vision module operating in isolation often produces nonsensical results, such asfaces floating in thin air, a mistake that no human observer of the scene will evermake," he continues. Using multiple vision modules, such as 3-D scene layout, objectlayout, and pose and activity recognition, Batra plans to create a vision system thatholistically understands a scene well enough to realize that a human face is unlikely to befloating on a tree.
"My...goal is to develop models, algorithms, and large-scale implementations to enablethe next generation of computer vision systems that understand the scene behind the image aswell as humans do," says Batra. His proposed systems will attempt to answer questionssuch as "where is the ground" or "what is the person in the imagedoing." Additional capabilities may include interpreting intent or anticipating thefuture, answering questions such as "is the person paying attention?" or "isshe headed for an accident?"
Batra will use several vision modules in conjunction to generate a small number ofguesses, or "diverse plausible hypotheses," that can help interpret a scene. Thesemodules might make their guesses by identifying flat surfaces, detecting and segmenting allobjects in the image, estimating human poses, and categorizing the scene into a type such asnatural, urban, or beach. Once each module has a guess for the scene, a "mediator"program will score the guesses and identify the one possibility that is most consistentacross the modules. Batra's approach can also produce multiple possibilities that can thenbe sent to a human operator for feedback.
Computer vision systems that can understand the meaning of an image will improve a widerange of applications. As one example, Batra emphasizes how important it is for a pedestriandetector on an autonomous car to know the difference between a real person and a picture ofa person on a billboard.
"Improved vision systems will fundamentally change our lives—from self-driving carsbringing mobility to the physically impaired, to unmanned aircraft helping law enforcementwith search and rescue in disasters," writes Batra.