ECE: Electrical & Computer Engineering
ECE News

Stocking the visual toolbox

Spotlight on New Faculty Members

Dhruv Batra joined ECE this semester, coming from Toyota Technological Institute in Chicago, where he served as a research assistant professor. He co-authored one book and has published 29 journal and conference papers. He earned his B. Tech from Benaras Hindu University in 2005 and his M.S. and Ph.D. from Carnegie Mellon University in 2006 and 2010 respectively.

Big data is one of today’s buzzwords. We hear about big data in the fields of biology, marketing, finance, and robotics. Big data is too big for our traditional tools, and we look to experts in both software and hardware for the new tools we need to make sense of this data.

ECE assistant professor Dhruv Batra is using machine learning for computers to help humans parse these huge datasets. “My particular focus is building algorithms that don’t necessarily replace humans, but assist them in extracting meaning from data,” he explains. “They might not perform perfectly, but will give reasonable choices to a human operator.”

Dhruv Batra

Dhruv Batra

His specific research area is at the intersection of machine learning and computer vision. “I’m interested in helping machines understand the visual world around them,” he says. “This is particularly important for autonomous systems and image or video analysis.”

One application that Batra’s research addresses is tracking people, their stance, and their activities in videos — which may be useful in the context of surveillance. With 72 hours of video being uploaded every minute to YouTube alone, it’s impossible for a human operator to watch everything. According to Batra, “we just don’t have the algorithms today that can perform this task with a sufficiently high accuracy.” Batra is focusing on developing algorithms that will provide four or five potential detections of suspicious activity for a human operator to check. “The computer will come up with plausible hypotheses, and the human can make intelligent decisions. We can think of it as interactive intelligence, as opposed to autonomous intelligence.”

Some of Batra’s previous algorithms help users sort through images to find what they seek. Selecting one image from a collection, the user scribbles two markings: a blue one on what he or she is looking for, a red one on the background. The scribbling is not precise, but if the computer can’t come up with something reasonable based on the user’s markings, it intelligently asks for more. The computer can then pick out the relevant parts from other images and show the results to the user.

Another of Batra’s programs can assemble a 3D model based on pictures from different angles. Again, the user scribbles red on the important parts, and the computer cuts them out and creates a 3D model.

3D modeling process

A computer creates a 3D model of an object from 2D pictures.

These problems are difficult because a computer has trouble inferring and processing information that isn’t explicitly present in an image. As an example, Batra notes that although he can’t see the chair that a person behind a desk is sitting on, he doesn’t expect that person is hovering. A computer, however, only sees a series of pixel values. What our brains understand naturally, computers do not.

A new machine learning course

Batra is teaching Introduction to Machine Learning and Perception this spring, in which students to learn how algorithms are used to identify patterns and make predictions from large quantities of data. The course is of wide interest and the roster includes students from ECE, computer science, computational biology, biomedical engineering, mechanical engineering, and industrial and systems engineering.

The course draws inspiration from real-world applications of machine learning including IBM’s Jeopardy-playing computer (Watson), Google’s self-driving car, and Microsoft’s game controller (Kinect).

Playing Jeopardy is a difficult task, according to Batra. “It’s not a clear question/answer system. Clues lead to some entity and the computer has to parse it, understand what the words mean, put it together, and do all this faster than a human.” IBM’s Watson, however, did all this and beat the human contestants.

The Google smart car has logged more than 300,000 miles of travel without driver intervention. “They do have a person sitting in the driver’s seat,” says Batra, “but my understanding is that he takes over only in the case of an emergency.” According to Batra, “this is the rise of machine learning. I want to convey this excitement to the students and teach them how to build these algorithms that deal with large quantities of data and improve performance.”

Batra also brings in a Microsoft Kinect sensor to help motivate students. “You start talking about video game controllers and you get their interest,” he says. The Kinect demonstrates how computers can understand human motion from visual data. Batra notes that although it’s not open source, “the team of researchers at Microsoft Research Cambridge, some of whom I collaborate with, have written a computer vision paper describing their approach. I teach my class the same techniques, and show them how the material they’re learning is implemented in the products they are using today.” Batra also mentions that the Kinect is such a sturdy and accurate sensor that researchers buy it as a research tool. Many undergraduates in Batra’s class are building demos based on Kinect.

The course is designed for senior undergraduate and for graduate students, but Batra is interested in offering it earlier in the curriculum, offering a similar class at the sophomore level, for example. The course is limited only by the need for students to understand linear algebra, probability, and programming.