<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Sergey Karayev | Blog</title>
  <link href="http://sergeykarayev.com/atom.xml" rel="self"/>
  <link href="http://sergeykarayev.com/"/>
  <updated>2013-03-06T00:02:50-08:00</updated>
  <id>http://sergeykarayev.com/</id>
  <author>
    <name>Sergey Karayev</name>
    <email>sergeykarayev@gmail.com</email>
  </author>
  
  <entry>
    <title>Setting up a development environment on Mac OS X 10.8 Mountain Lion</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2012-08-08T00:00:00-07:00</updated>
    <id>http://sergeykarayev.com/work/2012-08-08/setting-up-mountain-lion</id>
    <content type="html">&lt;p&gt;Do you want a good modern development setup? Ruby and Node for all the web goodness and Python with a beautiful ipython console, Numpy, and Scipy for math and statistical computing?&lt;/p&gt;

&lt;p&gt;This is the gospel. This is what you will do.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3 id='building_blocks_homebrew_xcode_x11_and_git'&gt;Building blocks: Homebrew, XCode, X11, and Git&lt;/h3&gt;

&lt;p&gt;This guide is for starting from a fresh Mountain Lion install, with nothing else installed. You can approximate that state, by getting rid of all your macports and finks and what have you. Delete your &lt;code&gt;/usr/local&lt;/code&gt;. Uninstall all XCodes are their developer tools.&lt;/p&gt;

&lt;p&gt;Install &lt;a href='http://mxcl.github.com/homebrew/'&gt;homebrew&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ruby &amp;lt;(curl -fsSk https://raw.github.com/mxcl/homebrew/go)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Homebrew lets us effortlessly install things from source. It is good, but it needs some help: Mountain Lion doesn&amp;#8217;t come with developer tools, and homebrew is not able to do much right now.&lt;/p&gt;

&lt;p&gt;Go to the App Store and download XCode (I am writing this in early August 2012, and the current version is 4.4). This can take a little bit, so while it&amp;#8217;s downloading, let&amp;#8217;s install X11 libraries that Mountain Lion stripped out.&lt;/p&gt;

&lt;p&gt;Go &lt;a href='http://xquartz.macosforge.org/trac/wiki'&gt;here&lt;/a&gt; and download 2.7.2+. Install it, and after it&amp;#8217;s done, fix the symlink it makes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ln -s /opt/X11 /usr/X11&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When XCode download is done, launch it and go to Preferences, Downloads tab, and install the Command Line Tools. When it finishes, we are almost ready to brew. Before we do, let&amp;#8217;s have XCode tell everyone where the tools are (this tip is from &lt;a href='https://gist.github.com/1860902'&gt;Get Mountain Lion and Homebrew to Be Happy&lt;/a&gt;).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Open up a new shell to make sure everything is loaded from scratch, and check that homebrew is good to go:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew doctor&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Fix the stuff it complains about until it doesn&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s get git:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew install git&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And we are off to the races!&lt;/p&gt;
&lt;hr /&gt;
&lt;h3 id='science_stuff_python_and_the_scipy_stack'&gt;Science stuff: Python and the SciPy stack&lt;/h3&gt;

&lt;p&gt;We are going to take advantage of a nice man&amp;#8217;s labor of love&amp;#8212;the &lt;a href='https://github.com/fonnesbeck/ScipySuperpack'&gt;Scipy Superpack&lt;/a&gt;&amp;#8212;to dramatically cut down the time and effort it will take us to get from nothing to a full Matlab and R replacement.&lt;/p&gt;

&lt;p&gt;To use it most effectively, we are going to base everything on the system Python, which right now is version 2.7.2. This is 0.0.1 versions behind the latest and greatest, but I think we&amp;#8217;ll survive.&lt;/p&gt;

&lt;p&gt;Python is best installed through an environment manager.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;easy_install pip
pip install virtualenv
mkdir ~/.virtual_envs
virtualenv ~/.virtual_envs/system
source ~/.virtual_envs/system/bin/activate&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This downloads a package manager, installs virtualenv, duplicates the system environment to a directory in your home directory, and activates this environment.&lt;/p&gt;

&lt;p&gt;Doing &lt;code&gt;which python&lt;/code&gt; should now show a &lt;code&gt;.virtual_envs&lt;/code&gt;-containing path. Make sure you add the last line to your &lt;code&gt;~/.bashrc&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now install the superpack. Since we may want to keep up to date on the exciting scientific python developments, let&amp;#8217;s check out the git repository so that we can update faster in the future.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mkdir ~/local &amp;amp;&amp;amp; cd ~/local
git clone git://github.com/fonnesbeck/ScipySuperpack.git
cd ScipySuperpack
sh install_superpack.sh&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Select &lt;code&gt;y&lt;/code&gt; at the prompt, and that&amp;#8217;s it! The script will install gfortran and binary builds of the latest development versions of Numpy, Scipy, Matplotlib, IPython, Pandas, Statsmodels, Scikit-learn, and PyMC, as well as their dependencies.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s get IPython to &lt;a href='http://stronginference.com/post/innovations-in-ipython'&gt;look beautiful using qtconsole&lt;/a&gt;. Download &lt;a href='http://get.qt.nokia.com/qt/source/qt-mac-opensource-4.7.4.dmg'&gt;Qt 4.7.4 libraries&lt;/a&gt; and &lt;a href='http://pyside.markus-ullmann.de/pyside-1.1.0-qt47-py27apple.pkg'&gt;PySide libraries&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, the PySide package installs its stuff into the system python site-packages directory, and our virtualenv ipython doesn&amp;#8217;t see it. We could try building PySide from source, but instead we are just going to symlink the relevant stuff from the system to our virtualenv folder. This isn&amp;#8217;t very clean, but it works for me.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ln -s /Library/Python/2.7/site-packages/pysideuic $HOME/.virtual_envs/system/lib/python2.7/site-packages
ln -s /Library/Python/2.7/site-packages/PySide $HOME/.virtual_envs/system/lib/python2.7/site-packages&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Install some remaining dependencies (for some reason, the DateUtils package that the Superpack installs doesn&amp;#8217;t work right for me):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install pygments
pip install dateutils&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now try it out:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ipython qtconsole --pylab=inline&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the qtconsole, try&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;plot(randn(500),rand(500),&amp;#39;o&amp;#39;,alpha=0.2)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And enjoy the inline goodness.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3 id='web_stuff_ruby_node'&gt;Web stuff: Ruby, Node&lt;/h3&gt;

&lt;p&gt;Ruby is also best installed using an environment manager. Install RVM:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl -L https://get.rvm.io | bash -s stable&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Open up a new shell and let RVM install itself into your bashrc:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;source ~/.rvm/scripts/rvm&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Open up another shell and test that it works:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;type rvm | head -n 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;should give &lt;code&gt;rvm is a function&lt;/code&gt;. Great.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s install a version of Ruby. 1.9.3 (the latest version) works fine with the compilers provided by XCode 4.4, so:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rvm install 1.9.3
rvm use 1.9.3 --default&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You should be all set now. For sanity, check that &lt;code&gt;which bundle&lt;/code&gt; shows some &lt;code&gt;.rvm&lt;/code&gt;-derived path. If there are problems, consult the &lt;a href='https://rvm.io/rvm/install/#explained'&gt;detailed installation guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For Node and its package manager, simply&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew install node&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Lastly, you may want to install Heroku at some point. I ran into a problem when installing from the &lt;a href='http://toolbelt.heroku.com'&gt;Heroku Toolbelt&lt;/a&gt; package. Instead, simply&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;gem install heroku
gem install foreman&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And you&amp;#8217;re done. Good stuff.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>CabFriendly -- a cloud-based mobile web app.</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2011-12-13T00:00:00-08:00</updated>
    <id>http://sergeykarayev.com/work/2011-12-13/cabfriendly</id>
    <content type="html">&lt;p&gt;(Joint work with &lt;a href='http://www.cs.berkeley.edu/~adarob/'&gt;Adam Roberts&lt;/a&gt; and &lt;a href='http://www.eecs.berkeley.edu/~pimentel/'&gt;Harold Pimentel&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;We have developed a cloud-based mobile web application to match users who request similar trips and would like to share a cab. The application is hosted on Amazon&amp;#8217;s EC2 service and combines several open-source frameworks (Django, PostgresQL, Redis, Node.js) with social networking (Facebook), mapping, and location-awareness (Google) APIs. The modularity of our design allows the service to easily scale in the cloud as the user base grows.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://cabfriendly.com'&gt;Use it!&lt;/a&gt;&lt;/p&gt;

&lt;h3 id='architecture'&gt;Architecture&lt;/h3&gt;

&lt;p&gt;&lt;img alt='Architecture of our application.' src='/work/images/cabfriendly/architecture.png' /&gt;&lt;/p&gt;

&lt;h3 id='use_case_scenario'&gt;Use case scenario&lt;/h3&gt;

&lt;p&gt;&lt;img alt='A use case scenario in screenshots of the app.' src='/work/images/cabfriendly/use_case.png' /&gt;&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Attentional Object Detection -- introductory slides.</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2011-03-24T00:00:00-07:00</updated>
    <id>http://sergeykarayev.com/work/2011-03-24/attentional-object-detection</id>
    <content type="html">&lt;p&gt;For the Berkeley computer vision retreat, I made a little presentation outlining my case for using ideas in sequential decision making in object detection. It is meant partly to start conversation on the subject, and partly to summarize four interesting papers in this vein. Some notable things are omitted, such as region-based proposal approaches.&lt;/p&gt;
&lt;div style=&quot;width:595px&quot; id=&quot;__ss_7393371&quot;&gt; &lt;strong style=&quot;display:block;margin:12px 0 4px&quot;&gt;&lt;a href=&quot;http://www.slideshare.net/sergeykarayev/attentional-object-detection-introductory-slides&quot; title=&quot;Attentional Object Detection - introductory slides.&quot;&gt;Attentional Object Detection - introductory slides.&lt;/a&gt;&lt;/strong&gt; &lt;object id=&quot;__sse7393371&quot; width=&quot;595&quot; height=&quot;497&quot;&gt; &lt;param name=&quot;movie&quot; value=&quot;http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=attentionalobjectdetectionretreat2011-110325215026-phpapp02&amp;rel=0&amp;stripped_title=attentional-object-detection-introductory-slides&amp;userName=sergeykarayev&quot; /&gt; &lt;param name=&quot;allowFullScreen&quot; value=&quot;true&quot;/&gt; &lt;param name=&quot;allowScriptAccess&quot; value=&quot;always&quot;/&gt; &lt;embed name=&quot;__sse7393371&quot; src=&quot;http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=attentionalobjectdetectionretreat2011-110325215026-phpapp02&amp;rel=0&amp;stripped_title=attentional-object-detection-introductory-slides&amp;userName=sergeykarayev&quot; type=&quot;application/x-shockwave-flash&quot; allowscriptaccess=&quot;always&quot; allowfullscreen=&quot;true&quot; width=&quot;595&quot; height=&quot;497&quot;&gt;&lt;/embed&gt; &lt;/object&gt; &lt;div style=&quot;padding:5px 0 12px&quot;&gt; View more &lt;a href=&quot;http://www.slideshare.net/&quot;&gt;presentations&lt;/a&gt; from &lt;a href=&quot;http://www.slideshare.net/sergeykarayev&quot;&gt;sergeykarayev&lt;/a&gt; &lt;/div&gt; &lt;/div&gt;
&lt;p&gt;Download &lt;a href=&quot;/work/files/attentional_object_detection_retreat_2011.pdf&quot;&gt;pdf&lt;/a&gt;, or Keynote &lt;a href=&quot;/work/files/attentional_object_detection_retreat_2011.key&quot;&gt;source&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Review of Kanan and Cottrell, Robust Classification of Objects, Faces, and Flowers Using Natural Image Statistics, CVPR 2010.</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2011-01-24T00:00:00-08:00</updated>
    <id>http://sergeykarayev.com/work/2011-01-24/kanan-cvpr2010</id>
    <content type="html">&lt;h2&gt;Review of Kanan and Cottrell, Robust Classification of Objects, Faces, and Flowers Using Natural Image Statistics, &lt;span class=&quot;caps&quot;&gt;CVPR&lt;/span&gt; 2010.&lt;/h2&gt;
&lt;p&gt;The paper&amp;#8217;s approach has three parts. The first is using an &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt;-based spatial pyramid feature; the second is computing a saliency map to sample interest points; and the third is in using Naive Bayes Nearest Neighbor (&lt;span class=&quot;caps&quot;&gt;NBNN&lt;/span&gt;) for classification. The approach is evaluated on three single-object datasets: Caltech-101 and -256, Aleix and Robert faces dataset of 120 individuals with 26 images each, and 102 Flowers (8200 images). The results are best yet published for Caltech-101 single-feature approaches, and match best multiple-feature performances; comparable to state-of-the-art on Caltech-256; match state-of-the-art on the AR Faces; and beat the single previously published result on the Flowers dataset.&lt;/p&gt;
&lt;h3&gt;&lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt;-based Local Features and Saliency&lt;/h3&gt;
&lt;p&gt;The images are first pre-processed by converting to a standard size, converting to the &lt;span class=&quot;caps&quot;&gt;LMS&lt;/span&gt; color space (designed to match human color receptor distributions), normalizing, and then applying a nonlinear transform inspired by modulation to luminance that happens in photoreceptors (a logarithmic compression). [Note: It would be interesting to see the effects of not performing this mapping.]&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; filters of size $b \times b$ ($b$ tuned on a Butterfly and Bird dataset to 24 pixels) are learned on about 5000 color image patches from the McGill color image dataset. To learn $d$ &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; features, the authors first run &lt;span class=&quot;caps&quot;&gt;PCA&lt;/span&gt; on the patches, discard the first principal component, retain $d$ following principal components, and then learn the &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; decomposition. I&amp;#8217;m not quite sure how this works&amp;#8212;I guess &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; is then only able to learn $d$ non-garbage bases?&lt;/p&gt;
&lt;h4&gt;Saliency Map&lt;/h4&gt;
&lt;p&gt;The &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; bases are used to place a saliency map over the image following the Saliency Using Natural statistics (&lt;span class=&quot;caps&quot;&gt;SUN&lt;/span&gt;) framework \cite{Zhang:2008:SUN}. The basic idea is that saliency of a point is the inverse $P(F)^{-1}$ of its probability under the &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; model $P(F=\mathbf{f})=\prod_i P(\mathbf{f}_i)$. Each unidimensional distribution is fit with a generalized Gaussian distribution:&lt;br /&gt;
\[ P(\mathbf{f}_i) = \frac{\theta_i}{2 \sigma_i \Gamma(\theta_i^{-1})} exp(-|\frac{\mathbf{f}_i}{\sigma_i}|^{\theta_i}) \]&lt;br /&gt;
Parameters are fit still using the McGill color database. A further strange nonlinear weighting of the dimensions of $\mathbf{f}$ is then done to weight rarer responses more heavily.&lt;/p&gt;
&lt;h3&gt;Fixations&lt;/h3&gt;
&lt;p&gt;The saliency map is normalized to a probability distribution, and &amp;#8220;fixations&amp;#8221; are sampled from it $T$ times. At each location $l_t$, an interesting fixation feature is extracted. It is a spatial pyramid over an area of $w=51$ pixels, using average pooling. So, the initial window of $w \times w \times d$ is represented by a vector of size $21d$, where $21 = 4 \times 4 \times 2 \times 2 \times 1 \times 1$ shows the structure of the spatial pyramid. Importantly, the normalized location $l_t$ of the fixation is also stored. To cast &lt;span class=&quot;caps&quot;&gt;SIFT&lt;/span&gt; in this framework, we would set $w=17$, $d=8$, and the spatial aggregation would be a flat $4 \times 4$ grid.&lt;/p&gt;
&lt;p&gt;After gathering $T$ fixations on every image in the training set, the unit-normalized SP vectors are then additionally processed by retaining only the first 500 &lt;span class=&quot;caps&quot;&gt;PCA&lt;/span&gt; components and whitening them. The chain of re-normalizations in this paper is quite long and I would appreciate theoretical justifications for these decisions.&lt;/p&gt;
&lt;h3&gt;Classification&lt;/h3&gt;
&lt;p&gt;The paper uses Kernel Density Estimation (&lt;span class=&quot;caps&quot;&gt;KDE&lt;/span&gt;) to model $P(\mathbf{g}_t|C=k)$, where $\mathbf{g}_t$ is the vector of fixation features. A Naive Bayes assumption is made, such that each fixation contributes independently to the total probability. The posterior is estimated with Bayes rule, assuming uniform class priors. 1-nearest neighbor &lt;span class=&quot;caps&quot;&gt;KDE&lt;/span&gt; is used, and the Euclidean distance between the fixation locations is considered in addition to the feature-to-exemplar distance. The final posterior probability is: &lt;br /&gt;
\[ P(\mathbf{g}_t | C=k) \propto max_i \frac{1}{||\mathbf{w}_{k,i}-\mathbf{g}_t||^2_2 + \alpha ||\mathbf{v}_{k,i}-\ell_t||^2_2 + \epsilon} \]&lt;br /&gt;
where $\mathbf{w}_{k,i}$ is a vector representing the $i$&amp;#8217;th examplar of a fixation from class $k$.&lt;/p&gt;
&lt;h3&gt;Discussion&lt;/h3&gt;
&lt;p&gt;The authors attribute the strength of their approach largely to the exemplar-based classifier. Their approach does outperform the comparable single-descriptor version of the Boiman and Irani &lt;span class=&quot;caps&quot;&gt;NBNN&lt;/span&gt; classifier \cite{Boiman:2008}; that could be due to a number of factors:&lt;/p&gt;
&lt;p&gt;1. &lt;del&gt;They also use location information in their comparison of fixations.&lt;/del&gt; &lt;span class=&quot;caps&quot;&gt;EDIT&lt;/span&gt;: &lt;span class=&quot;caps&quot;&gt;NBNN&lt;/span&gt; paper also appends location to the feature vector.&lt;br /&gt;
2. They sample features from a saliency map (vs. densely for &lt;span class=&quot;caps&quot;&gt;NBNN&lt;/span&gt;)&lt;br /&gt;
3. They use their &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; feature instead of &lt;span class=&quot;caps&quot;&gt;SIFT&lt;/span&gt; and other standard descriptors.&lt;/p&gt;
&lt;p&gt;It would be excellent to see a controlled evaluation of each of these factors. The paper as it is presents a very specific and unorthodox approach, and does not justify many of its design decisions.&lt;/p&gt;
&lt;p&gt;My questions:&lt;/p&gt;
&lt;p&gt;1. What is the contribution of the saliency map? How would the performance change under a random sampling scheme? What about an interest-point sampling scheme?&lt;br /&gt;
2. Why is the saliency computed at a single scale? The only reason for this working well is that the dataset is single-object and fixed-scale.&lt;br /&gt;
3. How would performance change if a standard feature, for example &lt;span class=&quot;caps&quot;&gt;SIFT&lt;/span&gt;, was extracted instead of the &lt;span class=&quot;caps&quot;&gt;ICA&lt;/span&gt; SP feature?&lt;br /&gt;
4. What is the contribution of the location feature? Why is it weighted at $\alpha=0.5$; what would cross-validation tune it to?&lt;/p&gt;
&lt;p&gt;In my mind, the most important part of the approach is NN classification. It would be interesting to re-implement this framework with a different bottom-up saliency map, for example the multi-scale one used in \cite{Alexe:2010} and traditional &lt;span class=&quot;caps&quot;&gt;SIFT&lt;/span&gt; features.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Review of Itti and Koch, Computational Modeling of Visual Attention, Nature Neuroscience 2001.</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2011-01-01T00:00:00-08:00</updated>
    <id>http://sergeykarayev.com/work/2011-01-01/Attention</id>
    <content type="html">&lt;h2&gt;Review of Itti and Koch, Computational Modeling of Visual Attention, Nature Neuroscience 2001.&lt;/h2&gt;
&lt;h3&gt;Why Attention?&lt;/h3&gt;
&lt;p&gt;The reigning paradigm for object detection is a scanning window approach, where object-specific windows are considered over different scales and locations of an image. This is an expensive and inelegant approach, and assuredly not how humans perform visual search. While of course we do not know how that is done, some observations can be grouped under a loose term of &lt;em&gt;attention&lt;/em&gt;; for example, search time will be lowered if people are told a distinguishing characteristic, such as color, of an object they are looking for in a cluttered scene.&lt;/p&gt;
&lt;p&gt;The main view of attention in this sense is by analogy to a spotlight. At least in our visual systems, not every part of a scene can be processed at a high level at once. Attention is the bottleneck through which high level processing flows. If we would use intuitions and experimental observations about visual attention in computational models of vision, we should focus on replacing exhaustive window search with a more intelligent spotlight.&lt;/p&gt;
&lt;p&gt;Here I will summarize an influential synthesis of attentional models by Itti and Koch. In a follow-up post, I will go over a newly published work from Chikkerur &lt;em&gt;et al.&lt;/em&gt; outlining a highly promising Bayesian model of attention.&lt;/p&gt;
&lt;h3&gt;The Review&lt;/h3&gt;
&lt;p&gt;The authors of this 2001 paper synthesize prior work on computational models of attentions and list five essential components of a model for bottom-up attention:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Perceptual saliency depends on surrounding context of stimuli. This is pre-attentive.&lt;/li&gt;
	&lt;li&gt;Bottom-up processing seems to culminate in a single &amp;#8220;saliency map.&amp;#8221;&lt;/li&gt;
	&lt;li&gt;Sequential nature of attention needs to be explained by something like inhibition of return.&lt;/li&gt;
	&lt;li&gt;Implicit (covert) and eye movement (overt) attentional deployments are coupled, posing coordinate system challenges to computational models.&lt;/li&gt;
	&lt;li&gt;Scene understanding and object recognition influence attention.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The authors first review the two-component framework for attention. Bottom-up attention is driven by saliency cues in the image and is on the order of 50ms. Top-down attention is driven by high-level cues and is on the order of 200ms, which is also around the time it takes to re-fixate the eyes.&lt;/p&gt;
&lt;p&gt;Interesting fact: visual sensory input is estimated to be $10^7$-$10^8$ bits per second at the optic nerve. Authors make the bottleneck analogy: &amp;#8220;Attention allows us to break down the problem of understanding a visual scene into a rapid series of computationally less demanding, localized visual analysis problems.&amp;#8221;&lt;/p&gt;
&lt;p&gt;I am not that interested in the brain areas involved, but a basic view never hurts. From the visual cortex, processing proceeds through the &lt;em&gt;dorsal stream&lt;/em&gt; of the posterior parietal cortex and the &lt;em&gt;ventral stream&lt;/em&gt; of the inferotemporal cortex. The prefrontal cortex is commonly viewed as the seat of attention. The processing streams converge there. Motor systems and high-level cognition are controlled by the &lt;span class=&quot;caps&quot;&gt;PFC&lt;/span&gt;, including eye movement through the superior colliculus.&lt;/p&gt;
&lt;h4&gt;Pre-attentive computation of visual features&lt;/h4&gt;
&lt;p&gt;Early vision proceeds in parallel across the entire visual field. It is not purely feedforward. Attention can modulate early processing, virtually increasing the stimulus strength. Different features contribute with different weights. There is little evidence for cross-modality interactions at a given visual area, and we lack the ability to efficiently detect conjunctive targets.&lt;/p&gt;
&lt;p&gt;Most importantly, feature contrast, not absolute strength, is what matters. Contrast extends past neuronal receptive fields (and contour completion could be due to long-range connections). My aside: our best current models of local feature descriptors attempt to achieve this effect, rather inelegantly.&lt;/p&gt;
&lt;h4&gt;Saliency map&lt;/h4&gt;
&lt;p&gt;Although there are multiple feature representation of the visual field, there is only one attentional focus. Most computational models hypothesize that the feature maps feed into a unique saliency map, which then controls attentional deployment. There is not a lot of biological evidence given for this hypothesis, but models with that assumption sometimes match human performance.&lt;/p&gt;
&lt;p&gt;For example, the authors&amp;#8217; previously published computational model that uses non-classical surround modulation effects (as mentioned above) seems to reproduce human behavior in some visual search tasks, as well as some automatic salient object detection in real color images.&lt;/p&gt;
&lt;h4&gt;Attentional selection; interplay with eye movements&lt;/h4&gt;
&lt;p&gt;Given a saliency map, attention is deployed to the most salient point in the visual scene. There needs to be a mechanism for sequential selection of points of lower saliency, however&amp;#8212;otherwise, attention would remain fixed. An efficient computational strategy is transient inhibition of the currently attended location, or &lt;em&gt;inhibition of return (&lt;span class=&quot;caps&quot;&gt;IOR&lt;/span&gt;)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The paper does not mention that the patterns of sequences of attentional deployments may carry information useful to object recognition, which is the hypothesis of a paper I plan on reviewing &lt;sup class=&quot;footnote&quot; id=&quot;fnr1&quot;&gt;&lt;a href=&quot;#fn1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;caps&quot;&gt;IOR&lt;/span&gt; seems simple but shows complex behavior. For example, it seems to be object-bound, tracking moving objects and compensating for motion of the observer. More basically, covert attention must interplay with overt attention (eye movements), which poses challenges to the coordinate system for &lt;span class=&quot;caps&quot;&gt;IOR&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;On the subject of covert vs. overt attention, there is evidence that covert attention is deployed to the endpoint of an upcoming saccade.&lt;/p&gt;
&lt;h4&gt;Recognition&lt;/h4&gt;
&lt;p&gt;The previous four points describe a bottom-up attentional system, accounting for the first few hundred milliseconds after stimulus presentation. After that, top-down effects can strongly affect attentional deployment. The authors outline several models of top-down influence, focusing in particular on a model by Schill &lt;em&gt;et al.&lt;/em&gt;_ &lt;sup class=&quot;footnote&quot; id=&quot;fnr2&quot;&gt;&lt;a href=&quot;#fn2&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, which will be subject of a later review. In a nutshell, the model&amp;#8217;s assumption is that objects are recognized iteratively from coarse to fine, with eye movements that maximize information gain.&lt;/p&gt;
&lt;p&gt;Another model that I will review is the &amp;#8220;scanpath theory&amp;#8221; &lt;sup class=&quot;footnote&quot; id=&quot;fnr3&quot;&gt;&lt;a href=&quot;#fn3&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; of Lawrence Stark, the late Berkeley optometry professor, who was a strong proponent of a prior-driven perceptual system. Eye movements over a scene, then, are mostly due to our cognitive model of what we expect to see.&lt;br /&gt;
&lt;hr /&gt;&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn1&quot;&gt;&lt;a href=&quot;#fnr1&quot;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; Paletta et al. Q-Learning of Sequential Attention for Visual Object Recognition from Informative Local Descriptors. &lt;span class=&quot;caps&quot;&gt;ICML&lt;/span&gt; (2005)&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn2&quot;&gt;&lt;a href=&quot;#fnr2&quot;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; Schill et al. Scene analysis with saccadic eye movements: Top-down and bottom-up modeling. J. Electron. Imaging (2001) vol. 10 (1) pp. 152&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn3&quot;&gt;&lt;a href=&quot;#fnr3&quot;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; Stark et al. Representation of human vision in the brain: How does human perception recognize images?. J. Electron. Imaging (2001) vol. 10 (1) pp. 123&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Jupiter album art recreation</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2010-08-16T00:00:00-07:00</updated>
    <id>http://sergeykarayev.com/fun/2010-08-16/jupiter</id>
    <content type="html">&lt;p&gt;I really like the cover of the album Jupiter by the band &lt;a href='http://www.myspace.com/strfkrmusic'&gt;Starfucker&lt;/a&gt; (their music is awesome as well).&lt;/p&gt;
&lt;img src='/fun/jupiter/actual.jpg' /&gt;
&lt;p&gt;I thought it would look cool as a huge print on my wall. Unfortunately, I could not find a large enough image. So, I decided to recreate it using &lt;a href='http://processing.org'&gt;Processing&lt;/a&gt;. The basic idea was surprisingly easy to match. The hardest part to get close was the color distribution. After a little HSV space histogramming (suggested by &lt;a href='http://www.cs.berkeley.edu/~mfritz/'&gt;Mario&lt;/a&gt;), I finally settled on manually picking a pallete of just six hues inspired by the original image, and then randomly (with some caveats) picking the saturation and brightness values. Pro tip: the slightly non-uniform widths of the stripes were modeled with a draw from the Dirichlet distribution.&lt;/p&gt;

&lt;p&gt;The result looks pretty good to me, and can be generated at any resolution. Pictures of the HUGE poster forthcoming.&lt;/p&gt;
&lt;a href='/fun/jupiter_full_screen.html'&gt; &lt;img src='/fun/jupiter/jupiter30.jpg' /&gt; &lt;/a&gt;
&lt;p&gt;If your browser can handle HTML5 (most modern browsers can), check out an &lt;a href='/fun/jupiter_full_screen.html'&gt;animated version&lt;/a&gt;, with a link to the source code.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Data collection photograph</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2010-08-02T00:00:00-07:00</updated>
    <id>http://sergeykarayev.com/fun/2010-08-02/data-collection-image</id>
    <content type="html">&lt;img src=&quot;/fun/images/data_collection.png&quot; alt=&quot;Image of me collecting data for object recognition&quot; /&gt;
</content>
  </entry>
  
  <entry>
    <title>My friend and I rode bikes from Victoria, B.C. to San Francisco, along the Pacific coast. A little blog pictorially <a href="http://bestcoast.posterous.com">documented</a> the trip.</title>
    <link href="http://sergeykarayev.com"/>
    <updated>2010-07-29T00:00:00-07:00</updated>
    <id>http://sergeykarayev.com/fun/2010-07-29/bike-trip-blog</id>
    <content type="html"></content>
  </entry>
  
</feed>