Ornaments Clusterization PART 2

Recap of Part 1

From part 1, we cluterized the 8000 images of ornaments into 50 clusters with different number of images; the clusters have been partitioned using the size of the images (number of pixels on the x-axis and on the y-axis) as the feature involved in the clusterization process. Using a k-means algorithm (full description), we can obtain the desired number of clusters.

Clusterize the Clusters

This was used to partition the images goups of images of similar size. Inside a given group, the following step was to compute the CK1 distance (description) between each image, and proceed to a clusterization by successive mergings : 

During the first step, each image is in a cluster of its own.
At each iteration, the two closest cluters are merged. We influence the process by choosing a distance function as well as a criterion to determine the distance between two clusters.
We chose the CK1 distance since it discriminates between similar and dissimilar images. (link to paper)
Matlab Function computing CK1 distance
Matlab Function computing CK1 distance
 
CK1 matrix illustrating hierarchical clustering
CK1 matrix illustrating hierarchical clustering

In the above matrix illustration, one can see the CK1 distances used to compute the hierarchical clustering graph displayed below. Indeed, one has 8 different images, labeled 1 to 8 in the below graph.

As for the determining the distance between two clusters, we chose the maximum distance between each element of two given clusters.
 We have implemented this method on a cluster containing the biggest available images, that are all different from each other.
 This resulted in the following tree : 
High CK1 Values
High CK1 Values
We notice that all distances are greater than 0.85, and the images with the lowest CK1-distance are still very dissimilar.
 If we were to find a threshold T under which two images can be considered similar, then T is likely to be less than 0.85.
 

We now use 5 images that we consider to be very similar to build a new tree and see what distances we get, as well as what hierarchy we obtain using the same process.

Image 1
Image 1
Image 2
Image 2
000026-3
Image 3

 

Image 4
Image 4
Image 5
Image 5

 

 

Lower CK1 values
Lower CK1 values

Using the information from this tre, we find that our threshold T is probably higher than 0.6.

 

 The next step is to repeat this inside every group of images with similar size. Hopefully, using an optimal threshold T, we should be able to form groups of similar images.
 To explain why this is necessary, we took the same five images as above (their sizes ranged from 484×28 to 516×30), and added 2 images(606×35 and 670×37)  that were similar on their own but different from the first 5 images.
Image 6
Image 6
Image 7
Image 7

This allowed to build a new tree that yielded surprising results :

New tree
New tree

In this tree, image 7 is grouped with images 1 to 5 before being grouped with image 6, which is undesirable. We think this is due to the fact that the sizes of images 1-5 and 6-7 are too different and the resizing has a negative effect.

The last objective is to find a method that allows us to distinguish between original ornaments and counterfeits.

Reference

http://www.cs.ucr.edu/~bhu002/IL/InitialLetter_SDM_cameraReady.pdf