Summing over image pixels: numba is fast.

A silly little comparison of a few different methods for computing the average pixel color of a masked image (summing over pixel values). Using Numba's @autojit shows an almost 10x speedup.

In [1]:
import numba
import numpy as np
from skimage.io import imread, imshow
img = imread('http://i.telegraph.co.uk/multimedia/archive/01428/darwin1_1428734c.jpg')
print(img.shape)
imshow(img)
(288, 460, 3)
In [2]:
mask = np.zeros(img.shape[:2], dtype=bool)
print(mask.shape)
h, w = mask.shape[:2]
mask[h/5 : 3*h/5, w/3 : 2*w/3] = True
N = sum(mask, dtype=float)
imshow(mask)
(288, 460)
In [3]:
# The first method sums only those elements of img that are not masked.

def average_pixel_1(img, mask):
    return sum(img[mask], 0) / N

@numba.autojit
def numba_average_pixel_1(img, mask):
    return sum(img[mask], 0)

print(average_pixel_1(img, mask))
%timeit -n 100 average_pixel_1(img, mask)

print(numba_average_pixel_1(img, mask) / N)
%timeit -n 100 numba_average_pixel_1(img, mask) / N
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 1.84 ms per loop
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 1.79 ms per loop
In [4]:
# The second method sums over 0's due to multiplying with the mask.

def average_pixel_2(img, mask):
    return sum(sum(img * mask[:, :, np.newaxis], 0), 0) / N

@numba.autojit
def numba_average_pixel_2(img, mask):
    # Cannot divide by sum of mask inside of an autojit
    # function for some reason.
    return sum(sum(img * mask[:, :, np.newaxis], 0), 0)

print(average_pixel_2(img, mask))
%timeit -n 100 average_pixel_2(img, mask)

print(numba_average_pixel_2(img, mask) / N)
%timeit -n 100 numba_average_pixel_2(img, mask) / N
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 1.44 ms per loop
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 1.02 ms per loop
In [5]:
# The third method reshapes the arrays so that only one sum() needs to be done.
# Of course, the reshape() creates new arrays...

def average_pixel_3(img, mask):
    return sum(img.reshape(-1, 3) * mask.reshape(-1)[:, np.newaxis], 0) / N

@numba.autojit
def numba_average_pixel_3(img, mask):
    return sum(img.reshape(-1, 3) * mask.reshape(-1)[:, np.newaxis], 0)

print(average_pixel_3(img, mask))
%timeit -n 100 average_pixel_3(img, mask)

print(numba_average_pixel_3(img, mask) / N)
%timeit -n 100 numba_average_pixel_3(img, mask) / N
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 3.11 ms per loop
[ 158.8438761   150.35078147  110.22443876]
100 loops, best of 3: 3.11 ms per loop

These are hardly speedups! Let's code stupider.

In [6]:
# The following two functions are indentical.

def average_pixel_for_loop(img, mask):
    h, w = mask.shape
    pixels = np.zeros(3, dtype=int)
    for r in range(h):
        for c in range(w):
            if mask[r, c]:
                pixels[0] += img[r, c, 0]
                pixels[1] += img[r, c, 1]
                pixels[2] += img[r, c, 2]
    return pixels

@numba.autojit
def numba_average_pixel_for_loop(img, mask):
    h, w = mask.shape
    pixels = np.zeros(3, dtype=int)
    for r in range(h):
        for c in range(w):
            if mask[r, c]:
                #pixels += img[r, c, :]
                # The above line actually leads to a time of 23 ms: very slow.
                pixels[0] += img[r, c, 0]
                pixels[1] += img[r, c, 1]
                pixels[2] += img[r, c, 2]
    return pixels

print(average_pixel_for_loop(img, mask) / N)
%timeit -n 3 average_pixel_for_loop(img, mask) / N

print(numba_average_pixel_for_loop(img, mask) / N)
%timeit -n 1000 numba_average_pixel_for_loop(img, mask) / N
[ 158.8438761   150.35078147  110.22443876]
3 loops, best of 3: 57.3 ms per loop
[ 158.8438761   150.35078147  110.22443876]
1000 loops, best of 3: 170 ┬Ás per loop

Summary: best non-numba method is sum(sum(img * mask[:, :, np.newaxis], 0), 0) at 1.4 ms. The same exact thing is sped up to 1 ms by numba. A stupid nested for loop with numba is still 5 times as fast.