Computer Science Department
University of California, Berkeley
Updated 06 Mar 2013
Our method for timely multi-class detection aims to give the best possible performance at any single point between a start time and a deadline. We formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve.
Using the Microsoft Kinect, we gather a large dataset of indoor crowded scenes. We investigate ways to unify state-of-the-art object detection systems and improve them with depth information.
Our method for additively decomposing local image patches, LDA-SIFT, shows best performance on a novel transparent object recognition dataset. We recursively extend the model to multiple layers and successfully apply it to general object classification.
We present an open-source system for quickly searching large image collections by multiple colors given as a palette, or by color similarity to a query image.
We present a mobile web app to match users who request similar trips and would like to share a cab. The application is hosted on Amazon’s EC2 service and combines several open-source frameworks (Django, PostgresQL, Redis, Node.js) with social networking and mapping APIs. The modularity of our design allows the service to easily scale in the cloud as the user base grows. The service is live.
Sparse coding as applied to natural image patches learns Gabor-like components that resemble those found in V1. This biological motivation for sparse coding would also suggest that the learned receptive field elements be organized spatially by their response properties. We investigate ways of enforcing a topography over the learned codes in a locally self-organizing map approach.
High-level computer vision and natural language processing are thoroughly intertwined, with the potential to jointly improve performance. We propose a well-defined subset of this underexplored overlap of problems, centered around improving grounded parsing of text and object recognition in images for pairs of images and their descriptions.
My senior Honors thesis, advised by Steve Seitz.
Abstract: The world around us is photographed millions of times a day, and a lot of images find their way online. We present a way to use this data to augment reality through a mobile phone. With our application, the user can zoom in on a distant landmark using other people’s photographs Our system relies on a 3D scene modeling back end that computes the viewpoint of each photograph in an unordered large photo collection. We present and discuss the overall system architecture, our implementation of the client application on the iPhone, our approach to picking the best views to offer a zoom path, and the complexities and limitations associated with mobile platforms.
Video | Thesis
A larger, searchable, and generally more useful course evaluations catalog for the University of Washington. The official catalog only lists the last three quarters worth of data, is not searchable, and does not group evaluations by course or instructor. I sought to remedy all three of these downsides, and explore Ruby on Rails while I was at it. My favorite part of this project was not the front end, but the data scraping backing it up. The site is now closed, but you can read a more detailed write-up, or get the source code.
A quarter-long research project under Steve Seitz. Induced by the presence of a webcam in a newly purchased laptop, my goal was to explore ways to interact with the computer through a video stream. I explored generally applicable motion tracking algorithms, with some elaborations and additions. Details and a video of it in action here.