If you Google for "a peaceful picture of downtown Chicago," it will probably return pictures of peaceful protests in downtown Chicago, according to Devi Parikh, who joined the department as an assistant professor in January. "I'm interested in designing algorithms and systems that can take an image and answer questions the same way a human would," she says. She wants to bring computer image processing closer to how the human brain works. "People are the best vision systems we have."
Parikh uses crowdsourcing and human debugging to prioritize issues for computer vision research and tounderstand how humans process images. She then creates algorithms that use both contextual and descriptive reasoning to identify objects as a human would.
Parikh is developing a system to discover the order in which computer vision issues should be tackled. Severalcomponents must converge for an answer. "We don't have a good way of figuring out which components we should be working on in the first place," she explains. "And because some components are inherently solving ambiguous problems, we don't know how perfect we can expect them to be." Parikh wants to determine which components result in the highest payoff for even slight improvements.
Parikh applies what she calls "human debugging" to the process. Using a complete computer vision system, she systematically substitutes a human for one component at a time. This person sees a specific input, and answers a question about that input. That's it. To get enough people for her experiments, Parikh uses crowdsourcing services. "We put images up online and get hundreds of people to look at them and answer questions about those images."
Teaching computers context
A related research area, for which Parikh also uses crowdsourcing to gather human responses, involves how computers and humans describe images. For example, given a picture of two children playing in a playground, a human will know that the context of the picture is the two children playing and not focus on trees in the background. Parikh explains that before computers can come to the same conclusion, we have to quantify how humans perceive and interpret different images.
Again, using Internet crowdsourcing, Parikh asked one set of people to look at and describe an image. A second set of people read these descriptions and created clipart scenes based on the descriptions. "We collected a huge dataset with two children on a playground, and can start building models to learn what in a scene can be ignored and what can't be," says Parikh.
Top: An example set of semantically similar scenes created by human subjects for the same given sentence, "Jenny just threw the beach ball angrily at Mike while the dog watches them both." Bottom: An illustration of the clip art used to create the children and the other available objects.
For her computer vision algorithms, Parikh uses both contextual reasoning and descriptions to help computers process visual input more like people. According to Parikh, contextual reasoning requires a computer to look at the area around the object of interest, and decide what the object is likely to be before looking at the object itself. "If you know that an object is in a corn field," she says, "even without looking you know that the object probably isn't a microwave. It reduces the possibilities without even looking at the object." Parikh is trying to effectively combine these kinds of contextual clues.
Computers with descriptive capabilities are useful any time humans need to interact with a computer, and help with machine learning, semi-autonomous vehicles, and image searching. "We're trying to establish an effective means of communication between humans and machines," Parikh explains. "If we can figure out a way to teach machines, they will be more effective than just computers." If a semiautonomous vehicle comes across an object that it can't identify, but knows to describe it as a "shiny sharp object," a human operator can more easily classify and interact with that object. For image search algorithms, such as facial recognition, this kind of descriptive ability is vital. When searching for a specific person, the computer may first respond with every brown-haired person in an image. If the user says, "now give me just the old ones with facial hair," the results can be limited.
Parikh's image search algorithms make human-computer communication more natural. Facial recognition is an important application of her work.
For all her research, Parikh enjoys formulating new ways of solving the problems. She explains that methods like human debugging are unexpected for machine learning. Now, computer, please find me a pair of snazzy high-heeled black shoes for the party next week...
New computer vision course
This spring, Devi Parikh is teaching Advanced Topics in Computer Vision. According to Parikh, "the goal is to get students up to date on state-of-the-art research." With a list of current hot topics in the field, students read a few different papers for each topic. During class, a student presents the findings of each paper and Parikh gives a high-level lecture about where computer vision is headed. "There's a lot of discussion," Parikh says. "Students read two or three papers each time."
The students are learning how to critique computer vision research, what has already been done in the field, for which important challenges remain. Even after just six weeks of class, Parikh noted that students have changed how they describe papers. "I can tell they're getting a feel for it." Students in the course have already taken the first computer vision course, and are using computer vision in their research.