border shadow

Online Image Management Platform - Image Similarity (2004)

For the Online Image Management Platform of my employer futureLAB, I implemented a Image Similarity and Fingerprinting Algorithm. This algorithm extracts features from an image and uses them to calculate a fingerprint. These fingerprints allow for comparing two images for similarity as well as for fast searching in a database.

This way functions like "search my image collection and find similar images to this one here" (similarity) can be provided or questions like "do I have this particular image somewhere within my collection already?" (matching) can be answered. Meta information like title, description and keywords are not needed for this (classification).

Image Similarity: the first image is the reference image. The second one is very similar (distance = 3.24). The third one (island) has a simlar image composition with water, land and sky and similar colors (distance = 72) while the fourth (yellow flowers) has a different image composition and completely different colors, hence a hight distance value of 265.

Features of the Algorithm

Performance: Comparing images of a large collection is a processing power and bandwidth intensive task. With the help of the fingerprints however, this is not the case anymore, since the fingerprint has to be calculated only once for each image and then can be stored in a database for fast searching and comparing.

Compare flexibly: If desired, images can be compared independent of rotation or vertical and horizontal mirroring as well as grayscale only, depending on which difference function is taken.

Problems

What makes two images similar? This is a difficult question for many reasons. Two images can be perceived as similar because of similar depicted objects (e.g. a house, a girl, a portrait of Albert Einstein) or because of a similar look (same colors, shapes, textures). As of today it is extremely difficult to recognize and classify objects reliably with computers. My algorithm therefore analyzes the image composition in terms of colors, shapes and textures.

The comparison algorithm has to judge differences in images in a similar way as a human would. This is easier said as done, because of some special properties of the human eye and brain.

Fingerprinting Algorithm

The two dimensional DCT: the upper left corner of the matrix represents the DC value, the ones to the right and below are first order values and so on.

The finger print (or feature vector) is composed of three parts: a grayscale DCT part, a color DCT part and an EDH part. First the image is scaled to a standard size. Since color differences can be calculated only very poorly with most color spaces, especially the common RGB color space, it is transformed into the Luv (see the Color Space FAQ).

Now a 2D-DCT is calculated over the L-channel (Luminosity) of the image. The first coefficient stands for the DC value, or average luminosity of the image. The next coefficients represent the higher order values with increasing frequency. A number of these coefficients were taken and normalized for the grayscale fingerprint part (the first 10 coefficients seemed to be a good number). This part of the fingerprints represents the basic composition of the image.

Then a 2D-DCT of the two color components were calculated and and used for the color part of the fingerprint. Here only the first three of each where taken, since the human eye is much more sensitive to luminosity than to color. This part of the fingerprint represents the color composition of the image with reduced spatial resolution compared to the gray scale part.

On the luminosity channel of the image a EDH (Edge Direction Histogram) is calculated using a Sobel kernel. The histogram consists of eight equal bins N, NE, E, SE, S, SW, W and NW. This part of the fingerprint represents the "flow directions" of the image.

Similarity

Identical images yield the same fingerprint, similar images yield fingerprints similar to each other (according to a distance function) and unequal images yield a totally different fingerprints (of course). The distance function compares the fingerprints by weighting the coefficients and calculating the Euclidean Distance. For example, if the general brightness of the image should not be considered when comparing, the DC components weight can simply be set to zero. For less detail and a more tolerant search the higher coefficients can be made smaller or set to zero. For grayscale searching simply set the color components weights to zero. For rotation or mirror invariant searching shuffle the components accordingly and compare again.

This weight vector can be used for a lot of tuning and the values should be optimized with care.