Stocking the visual toolbox

Big data is one of todays buzzwords. We hear about big data in the fields of biology, marketing, finance, and robotics. Big data is too big for our traditional tools, and we look to experts in both software and hardware for the new tools we need to make sense of this data.

ECE assistant professor Dhruv Batra is using machine learning for computers to help humans parse these huge datasets. My particular focus is building algorithms that dont necessarily replace humans, but assist them in extracting meaning from data, he explains. They might not perform perfectly, but will give reasonable choices to a human operator.

Dhruv Batra
Dhruv Batra

His specific research area is at the intersection of machine learning and computer vision. Im interestedin helping machines understand the visual world around them, he says. This is particularly important for autonomous systems and image or video analysis.

One application that Batras research addresses is tracking people, their stance, and their activitiesin videos — which may be useful in the context of surveillance. With 72 hours of video being uploaded every minute to YouTube alone, its impossible for a human operator to watch everything. According to Batra, we just dont have the algorithms today thatcan perform this task with a sufficiently high accuracy. Batra is focusing on developing algorithms that will provide four or five potential detections of suspicious activity for a human operator to check. The computer will come up with plausible hypotheses,and the human can make intelligent decisions. We can think of it as interactive intelligence, as opposed to autonomous intelligence.

Some of Batras previous algorithms help users sort through images to find what they seek. Selecting one image from a collection, the user scribbles two markings: a blue one on what he or she is looking for, a red one on the background. The scribbling is not precise, but if the computer cant come up with something reasonable based on the users markings, it intelligently asks for more. The computer can then pick out the relevant parts from other images and show the results to the user.

Another of Batras programs can assemble a 3D model based on pictures from different angles. Again, the user scribbles red on the important parts, and the computer cuts them out and creates a 3D model.

3D modeling process
A computer creates a 3D model of an object from 2D pictures.

These problems are difficult because a computer has trouble inferring and processing information that isnt explicitly present in an image. As an example, Batra notes that although he cant see the chair that a person behind a desk is sitting on, he doesnt expect that person is hovering. A computer, however, only sees a series of pixel values. What our brains understand naturally, computers do not.

A new machine learning course

Batra is teaching Introduction to Machine Learning and Perception this spring, in which students to learn how algorithms are used to identify patterns and make predictions from large quantities of data. The course is of wide interest and the roster includes students from ECE, computer science, computational biology, biomedical engineering, mechanical engineering, and industrial and systems engineering.

The course draws inspiration from real-world applications of machine learning including IBMs Jeopardy-playing computer (Watson), Googles self-driving car, and Microsofts game controller (Kinect).

Playing Jeopardy is a difficult task, according to Batra. Its not a clear question/answer system. Clues lead to some entity and the computer has to parse it, understand what the words mean, put it together, and do all this faster than a human. IBMs Watson, however, did all this and beat the human contestants.

The Google smart car has logged more than 300,000 miles of travel without driver intervention. They do have a person sitting in the drivers seat, says Batra, but my understanding is that he takes over only in the case of an emergency. According to Batra, this is the rise of machine learning. I want to convey this excitement to the students and teach them how to build these algorithms that deal with large quantities of data and improve performance.

Batra also brings in a Microsoft Kinect sensor to help motivate students. You start talking about video game controllers and you get their interest, he says. The Kinect demonstrates how computers can understand human motion from visual data. Batra notes that although its not open source, the team of researchers at Microsoft Research Cambridge, some of whom I collaborate with, have written a computer vision paper describing their approach. I teach my class the same techniques, and show them how the material theyre learning is implemented in the products they are using today. Batra also mentions that the Kinect is such a sturdy and accurate sensor that researchers buy it as a research tool. Many undergraduates in Batras class are building demos based on Kinect.

The course is designed for senior undergraduate and for graduate students, but Batra is interested in offering it earlier in the curriculum, offering a similar class at the sophomore level, for example. The course is limited only by the need for students to understand linear algebra, probability, and programming.