Welcome to Rayleigh’s documentation!

Installation

Rayleigh has only been tested with Python 2.7, on OS X 10.8 and Ubuntu 12.04.

First, install FLANN from source, making sure to compile with Python support. Test that you can import pyflann from a python console.

Now, go into the Rayleigh directory that you cloned from the Github repository, and run

pip install -r requirements.txt

Quick start

To get your Rayleigh running locally, we must a) populate the database with image information; b) load a SearchableImageCollection object.

The best way to get started is to download a .zip file containing all you need for a demo here. Unzip it in the repo dir, so that it populates a subfolder called data/.

Let’s get the mongodb server running:

cd rayleigh_repo_dir
mongod --config mongo.conf

Now to load the data from the zipfile you downloaded

mongorestore --port 27666 data/flickr_100k

We should now be all ready to run the server. In another shell tab:

python rayleigh/client/app.py

The website should now be up at http://127.0.0.1:5000/

Datasets

You can download more pickled SearchableImageCollections from https://s3.amazonaws.com/rayleigh/.

To construct your own dataset from scratch, run

nosetests test/collection.py:TestFlickrCollection -s

This uses the file data/flickr_1M.json.gz, which lists a million images from Flickr fetched by the “interestingness” API query over the last few years.

Running this will download and process 100K images (or less, or more, if you modify the code). Data is stored into the mongodb database. It will help to have multiple cores working, so in a separate tab, do

ipcluster start 8

This relies on the IPython parallel framework.

If you want, you can reoutput the Flickr URLs

python rayleigh/assemble_flickr_dataset.py

Have fun!

rayleigh Package

rayleigh Package

Rayleigh is an open-source system for quickly searching medium-sized image collections by multiple colors given as a palette or derived from a query image.

assemble_flickr_dataset Module

Assemble a list of URLs to Flickr images fetched by repeated calls to the API method flickr.interestingness.getList.

To use, you must place your [API key](http://www.flickr.com/services/apps/72157632345167838/key/) into a file named ‘flickr_api.key’ located in the same directory as this file.

There is a limit of 500 images per day, so to obtain more images than that, we iterate backwards from the current date until the desired number of images is obtained.

rayleigh.assemble_flickr_dataset.assemble_flickr_dataset(api_filename, data_filename, num_images_to_load)[source]

Assemble dataset containing the specified number of images using Flickr ‘interestingness’ API calls. Returns nothing; writes data to file.

Parameters :

api_filename : string

File should contain only one line, with your Flickr API key.

data_filename : string

Gzipped JSON file that will contain the dataset. If it already exists, we will load the data in it and not repeat the work done.

num_images_to_load : int

rayleigh.assemble_flickr_dataset.get_photos_list(api_key, date)[source]
rayleigh.assemble_flickr_dataset.get_url(photo)[source]
rayleigh.assemble_flickr_dataset.ids_and_urls_from_dataset(data_filename, num_images)[source]

Load the data in given filename and return the first num_images urls (or all of them if num_images exceeds the total number).

Parameters :

data_filename : string

JSON file that will contain the dataset.

num_images : int

Returns :

ids : list

the Flickr IDs of the photos

urls : list

URLs of the photos

collection Module

ImageCollection stores color information about images and exposes a method to add images to it, with support for parallel processing. The datastore is MongoDB, so a server must be running (launch with the settings in mongo.conf).

class rayleigh.collection.ImageCollection(palette)[source]

Bases: object

Initalize an empty ImageCollection with a color palette that will be used to extract color information from images.

Parameters :

palette : Palette

Palette object representing the accepted colors.

Methods

add_images(image_urls, image_ids=None)[source]

Add all images in a list of URLs. If ipcluster is running, load images in parallel.

Parameters :

image_urls : list

image_ids : list, optional

If given, images are stored with the given ids. If None, the index of the image in the dataset is its id.

get_hists()[source]

Return histograms of all images as a single numpy array.

Returns :

hists : (N,K) ndarray

where N is the number of images in the database and K is the number of colors in the palette.

get_id_ind_map()[source]

Return dict of id to index and index to id.

get_image(image_id, no_hist=False)[source]

Return information about the image at id, or None if it doesn’t exist.

Parameters :

image_id : string

no_hist : boolean

If True, does not return the histogram, only the image metadata.

Returns :

image : dict, or None

information in database for this image id.

static load(filename)[source]

Load ImageCollection from filename.

save(filename)[source]

Save self to filename.

rayleigh.collection.get_mongodb_collection()[source]

Establish connection to MongoDB and return the relevant collection.

Returns :

collection : pymongo.Collection

Pymongo Collection of images and their histograms.

rayleigh.collection.process_image(args)[source]
Returns :success : boolean

image Module

class rayleigh.image.Image(url, _id=None)[source]

Bases: object

Read the image at the URL in RGB format, downsample if needed, and convert to Lab colorspace. Store original dimensions, resize_factor, and the filename of the image.

Image dimensions will be resized independently such that neither width nor height exceed the maximum allowed dimension MAX_DIMENSION.

Parameters :

url : string

URL or file path of the image to load.

id : string, optional

Name or some other id of the image. For example, the Flickr ID.

Methods

MAX_DIMENSION = 241
as_dict()[source]

Return relevant info about self in a dict.

output_quantized_to_palette(palette, filename)[source]

Save to filename a version of the image with all colors quantized to the nearest color in the given palette.

Parameters :

palette : rayleigh.Palette

Containing K colors.

filename : string

Where image will be written.

class rayleigh.image.PaletteQuery(palette_query)[source]

Bases: object

Extract a L*a*b color array from a dict representation of a palette query. The array can then be used to histogram colors, output a palette image, etc.

Parameters :

palette_query : dict

A mapping of hex colors to unnormalized values, representing proportion in the palette (e.g. {‘#ffffff’: 20, ‘#cc3300’: 0.5}).

palette Module

Encapsulate the list of hex colors and array of Lab values representations of a palette (codebook) of colors.

Provide methods to work with color conversion and the Palette class.

Provide a parametrized method to generate a palette that covers the range of colors.

class rayleigh.palette.Palette(num_hues=8, sat_range=2, light_range=2)[source]

Bases: object

Create a color palette (codebook) in the form of a 2D grid of colors, as described in the parameters list below. Further, the rightmost column has num_hues gradations from black to white.

Parameters :

num_hues : int

number of colors with full lightness and saturation, in the middle

sat_range : int

number of rows above middle row that show the same hues with decreasing saturation.

light_range : int

number of rows below middle row that show the same hues with decreasing lightness.

Returns :

palette: rayleigh.Palette :

Methods

output(dirname)[source]

Output an image of the palette, josn list of the hex colors, and an HTML color picker for it.

Parameters :

dirname : string

directory for the files to be output

searchable_collection Module

Methods to search an ImageCollection with brute force, exhaustive search.

class rayleigh.searchable_collection.SearchableImageCollection(image_collection, dist_metric, sigma, num_dimensions)[source]

Bases: object

Initialize with a rayleigh.ImageCollection, a distance_metric, and the number of dimensions to reduce the histograms to.

Parameters :

image_collection : rayleigh.ImageCollection

dist_metric : string

must be in self.DISTANCE_METRICS

sigma : nonnegative float

Amount of smoothing applied to histograms. If 0, none.

num_dimensions : int

number of dimensions to reduce the histograms to, using PCA. If 0, do not reduce dimensions.

Methods

get_image_hist(img_id)[source]

Return the smoothed image histogram of the image with the given id.

Parameters :img_id : string
Returns :color_hist : ndarray
static load(filename)[source]

Load ImageCollection from filename.

nn_ind(color_hist, num)[source]

Return num closest nearest neighbors (potentially approximate) to the query color_hist, and the distances to them.

Override this search method in extending classes.

Parameters :

color_hist : (K,) ndarray

histogram over the color palette

num : int

number of nearest neighbors to return.

Returns :

nn_ind : (num,) ndarray

Indices of the neighbors in the dataset.

nn_dists (num,) ndarray :

Distances to the neighbors returned.

reduce_dimensionality()[source]

Compute and store PCA dimensionality-reduced histograms.

save(filename)[source]

Save self to filename.

search_by_color_hist(color_hist, num=20, reduced=False)[source]

Search images in database by color similarity to the given histogram.

Parameters :

color_hist : (K,) ndarray

histogram over the color palette

num : int, optional

number of nearest neighbors to ret

reduced : boolean, optional

is the given color_hist already reduced in dimensionality?

Returns :

query_img : dict

info about the query image

results : list

list of dicts of nearest neighbors to query

search_by_image(image_filename, num=20)[source]

Search images in database by color similarity to image.

See search_by_color_hist().

search_by_image_in_dataset(img_id, num=20)[source]

Search images in database for similarity to the image with img_id in the database.

See search_by_color_hist() for implementation.

Parameters :

img_id : string

num : int, optional

Returns :

query_img_data : dict

results : list

list of dicts of nearest neighbors to query

smooth_histograms()[source]

Smooth histograms with a Gaussian.

class rayleigh.searchable_collection.SearchableImageCollectionCKDTree(image_collection, distance_metric, sigma, dimensions)[source]

Bases: rayleigh.searchable_collection.SearchableImageCollection

Use the cKDTree data structure from scipy.spatial for the index.

Parameters:
  • LEAF_SIZE (int): The number of points at which the algorithm switches

    over to brute-force.

  • EPS (non-negative float): Parameter for query(), such that the

    k-th returned value is guaranteed to be no further than (1 + eps) times the distance to the real k-th nearest neighbor.

NOTE: These parameters have not been tuned.

Methods

DISTANCE_METRICS = ['manhattan', 'euclidean']
EPSILON = 1
LEAF_SIZE = 5
Ps = {'euclidean': 2, 'manhattan': 1}
build_index()[source]
static load(filename)[source]
nn_ind(color_hist, num)[source]
class rayleigh.searchable_collection.SearchableImageCollectionExact(image_collection, dist_metric, sigma, num_dimensions)[source]

Bases: rayleigh.searchable_collection.SearchableImageCollection

Search the image collection exhaustively (mainly through np.dot).

Methods

DISTANCE_METRICS = ['manhattan', 'euclidean', 'chi_square']
nn_ind(color_hist, num)[source]

Exact nearest neighbor seach through exhaustive comparison.

class rayleigh.searchable_collection.SearchableImageCollectionFLANN(image_collection, distance_metric, sigma, dimensions)[source]

Bases: rayleigh.searchable_collection.SearchableImageCollection

Search the image collection using the FLANN library for aNN indexing.

The FLANN index is built with automatic tuning of the search algorithm, which can take a while (~90s on 25K images).

Methods

DISTANCE_METRICS = ['manhattan', 'euclidean', 'chi_square']
build_index(index_filename=None)[source]
static load(filename)[source]
nn_ind(color_hist, num)[source]
save(filename)[source]

tictoc Module

class rayleigh.tictoc.TicToc[source]

MATLAB-like tic/toc.

Methods

qtoc(label=None)[source]

Call toc(label, quiet=True).

running(label=None, msg=None, interval=1)[source]

Print <msg> every <interval> seconds, running the timer for <label>.

Args:

label (string): [optional] label for the timer

msg (string): [optional] message to print

interval (int): [optional] print every <interval> seconds

Return
self
Raises
none
tic(label=None)[source]

Start timer for given label.

Args:
label (string): optional label for the timer.
Returns:
self
toc(label=None, quiet=False)[source]

Return elapsed time for given label.

Args:

label (string): [optional] label for the timer.

quiet (boolean): [optional] print time elapsed if false

Returns:
elapsed (float): time elapsed

util Module

rayleigh.util.color_hist_to_palette_image(color_hist, palette, percentile=90, width=200, height=50, filename=None)[source]

Output the main colors in the histogram to a “palette image.”

Parameters :

color_hist : (K,) ndarray

palette : rayleigh.Palette

percentile : int, optional:

Output only colors above this percentile of prevalence in the histogram.

filename : string, optional:

If given, save the resulting image to file.

Returns :

rgb_image : ndarray

rayleigh.util.hex2rgb(hexcolor_str)[source]
Args:
  • hexcolor_str (string): e.g. ‘#ffffff’ or ‘33cc00’
Returns:
  • rgb_color (sequence of floats): e.g. (0.2, 0.3, 0)
rayleigh.util.histogram_colors_smoothed(lab_array, palette, sigma=10, plot_filename=None, direct=True)[source]

Returns a palette histogram of colors in the image, smoothed with a Gaussian. Can smooth directly per-pixel, or after computing a strict histogram.

Parameters :

lab_array : (N,3) ndarray

The L*a*b color of each of N pixels.

palette : rayleigh.Palette

Containing K colors.

sigma : float

Variance of the smoothing Gaussian.

direct : bool, optional

If True, constructs a smoothed histogram directly from pixels. If False, constructs a nearest-color histogram and then smoothes it.

Returns :

color_hist : (K,) ndarray

rayleigh.util.histogram_colors_strict(lab_array, palette, plot_filename=None)[source]

Return a palette histogram of colors in the image.

Parameters :

lab_array : (N,3) ndarray

The L*a*b color of each of N pixels.

palette : rayleigh.Palette

Containing K colors.

plot_filename : string, optional

If given, save histogram to this filename.

Returns :

color_hist : (K,) ndarray

rayleigh.util.histogram_colors_with_smoothing(lab_array, palette, sigma=10)[source]

Assign colors in the image to nearby colors in the palette, weighted by distance in Lab color space.

Parameters :

lab_array (N,3) ndarray: :

N is the number of data points, columns are L, a, b values.

palette : rayleigh.Palette

containing K colors.

sigma : float

(0,1] value to control the steepness of exponential falloff. To see the effect:

>>> from pylab import * :

>>> ds = linspace(0,5000) # squared distance :

>>> sigma=10; plot(ds, exp(-ds/(2*sigma**2)), label=’$sigma=%.1f$’%sigma) :

>>> sigma=20; plot(ds, exp(-ds/(2*sigma**2)), label=’$sigma=%.1f$’%sigma) :

>>> sigma=40; plot(ds, exp(-ds/(2*sigma**2)), label=’$sigma=%.1f$’%sigma) :

>>> ylim([0,1]); legend(); :

>>> xlabel(‘Squared distance’); ylabel(‘Weight’); :

>>> title(‘Exponential smoothing’) :

>>> #plt.savefig(‘exponential_smoothing.png’, dpi=300) :

sigma=20 seems reasonable: hits 0 around squared distance of 4000.

Returns: :

color_hist : (K,) ndarray

the normalized, smooth histogram of colors.

rayleigh.util.makedirs(dirname)[source]

Does what mkdir -p does, and returns dirname.

rayleigh.util.output_histogram_base64(color_hist, palette)[source]

Return base64-encoded image containing the color palette histogram.

Args:
  • color_hist (K, ndarray)
  • palette (Palette)
Returns:
  • data_uri (base64 encoded string)
rayleigh.util.output_plot_for_flask(color_hist, palette)[source]

Return an object suitable to be sent as an image by Flask, containing the color palette histogram.

Args:
  • color_hist (K, ndarray)
  • palette (Palette)
Returns:
  • png_output (StringIO)
rayleigh.util.palette_query_to_rgb_image(palette_query, width=200, height=50)[source]

Convert a list of hex colors and their values to an RGB image of given width and height.

Args:
  • palette_query (dict):

    a dictionary of hex colors to unnormalized values, e.g. {‘#ffffff’: 20, ‘#33cc00’: 0.4}.

rayleigh.util.plot_histogram(color_hist, palette, plot_filename=None)[source]

Return Figure containing the color palette histogram.

Args:
  • color_hist (K, ndarray)

  • palette (Palette)

  • plot_filename (string) [default=None]:

    Save histogram to this file, if given.

Returns:
  • fig (Figure)
rayleigh.util.rgb2hex(rgb_number)[source]
Args:
  • rgb_number (sequence of float)
Returns:
  • hex_number (string)
rayleigh.util.smooth_histogram(color_hist, palette, sigma=10)[source]

Smooth the given palette histogram with a Gaussian of variance sigma.

Parameters :

color_hist : (K,) ndarray

palette : rayleigh.Palette

containing K colors.

Returns :

color_hist_smooth : (K,) ndarray