# [BlogPost 2]: Construction of a Similarity Measure between arbitrary Shapes

In the preceding blog post, we introduced and implemented a shape matching procedure adapted from [1]. In this blog post, we exploit this procedure to compute different distances between two arbitrary shapes, as suggested in [1]. Then, we construct a similarity measure out of these distances and construct a fully connected graph on a set of ten words and look for clusters of words in this graph.

#### Construction of shape distances

Given two arbitrary shapes, we can obtain potential correspondences between points on these two shapes by using the shape matching procedure previously introduced. For example, correspondences between points on the two words in figure 1 a) are given by the best bipartite matching in figure 1 b).

 a) The two words on which the shape matching procedure is performed b) Best bipartite matching between these two shapes and cost of this matching (see first blog post).

Then, we can use this finite set of correspondences to estimate a transformation of the plane $T:\mathbb{R}^2\rightarrow\mathbb{R}^2$, allowing us to map any point on the first shape to a point on the second shape, and therefore extending the discrete set of correspondences to the continuous level.

To this end, we use two thin plate spline transformations (see [1]) for each coordinates:

$T(x,y)=(f_x(x,y),f_y(x,y)).$

More precisely, we choose a source and a target between the two shapes (the estimation procedure is symmetric in the choice of the source and the target) and then we solve two 1D interpolation problems. For example, the thin plate spline transformation $f_x(x,y)$ is estimated by choosing in the family of thin plate spline transformations the one minimizing the quantity:

$H=\sum_{i=1}^n(v_i-f_x(x_i,y_i))^2+\lambda I_f,$

where $v_i$ denotes the target function values at the locations $(x_i,y_i)$ (here $v_i$ is the x-coordinate of the point on the target shape that forms a correspondence pair with the point $(x_i,y_i)$ on the source shape).

This is nothing else but a relaxed least squares estimation, with a penalty term $\lambda I_f$, with $I_f$ the bending energy:

$I_f=\int\int_{\mathbb{R}^2}\left(\frac{\partial^2 f_x}{\partial x^2}\right)^2+2\left(\frac{\partial^2 f_x}{\partial x\partial y}\right)^2+\left(\frac{\partial^2 f_x}{\partial y^2}\right)^2 dxdy,$

and $\lambda$ is a real parameter that controls the importance of the penalty (for $\lambda=0$ this a just a simple least squares estimation).

Therefore, roughly speaking, we seek for the transformation that minimizes the bending energy necessary to map points on the source shape to their corresponding matches on the target shape.

In the particular example of the two words in figure 1, we can see on the subsequent video the effect of applying the transformation $T$ to the first word.

With this mapping, we can already imagine two distances between two arbitrary shapes $\mathcal{P}$ and $\mathcal{Q}$. The first one is a generalization  to the continuous level of the total cost matching $H$ introduced in the first blog post:

$D_{mc}(\mathcal{P},\mathcal{Q})=\frac{1}{n}\sum_{p\in\mathcal{P}}\mbox{argmin}_{q\in \mathcal{Q}}C(p,T(q))+\frac{1}{m}\sum_{q\in \mathcal{Q}}\mbox{argmin}_{p\in \mathcal{P}}C(p,T(q)),$

Figure 2: Top: Source image, Middle: Source image warped with thin plate spline transformation, Bottom: Target image.

with $T(\cdot)$ the thin plate spline transformation and $C(\cdot,\cdot)$ the cost of matching two points introduced in the previous post. We can also compute another distance, $D_{bend}$ corresponding to the amount of deformation necessary to align the two shapes. This distance is naturally set equal to the bending energy $I_f$.

Finally, we introduce a last distance $D_{ac}$ capturing the appearance information (texture and intensity information of the two grayscale images). To this end, we apply the thin plate spline transformation to the source image (see figure 2) and compute the difference (pixels by pixels, see figure 3) between the resulting warped image and the target image.

Then, we compute the squared mean and the standard deviation of the resulting image (interpreted as a matrix), and set the third distance $D_{ac}$  equal to the sum of these two quantities.

Figure 3: Top: Difference between the two original images,                                                            Bottom: Difference between the two images after transformation                                                              of the source image

#### Construction of a Similarity measure

From the three distances previously introduced, we wish to design a similarity measure, expressing how close is a shape from another. To this end, we first build a single distance as a linear combination of the three distances $D_{mc},D_{bend},D_{ac}$:

$D=0.2\times \mbox{log}(D_{mc})+0.9\times \mbox{log}(D_{bend})+0.3\times \mbox{log}(D_{ac}),$

where we chose the coefficients by performing a principal components analysis on a training set of words.

From this unified notion of distance, we introduce the similarity between two shapes $\mathcal{P}$ and $\mathcal{Q}$:

$S(\mathcal{P},\mathcal{Q})=\mbox{exp}(-\mathcal{D}(\mathcal{P},\mathcal{Q})^2).$

Then, if two shapes are very close, then their distance will be close to 0, and the similarity close to 1. Conversely, shapes very different will have a similarity close to 0.

This measure of similarity will allow us to perform clustering analysis using the $k$-nearest neighbors methodology. To this end, we build an undirected graph: each node correspond to a shape and edges between the nodes are weighted by the similarity function between the two nodes (shapes). Theoretically, this is a fully connected graph, but to save computational time, we can define a threshold of minimal similarity. Then, all pair of shapes whose similarity is under this threshold will be considered as not similar enough, and therefore will not be linked in the graph. Once this graph is constructed, we perform  $k$-nearest neighbors on it to identify clusters.

On figure 4, we constructed this graph for a set of ten handwritten words . We can clearly observe the formation of three clusters of words, corresponding to words semantically identical.

##### Conclusion and milestones for the rest of the semester

The shape matching procedure previously introduced allowed us to design three notions of distances between two arbitrary shapes. By considering a well chosen linear combinations of these three distances, we were able to construct a similarity measure. Then, we formulated the problem in a graph  theoretical framework, which allowed us to perform a clustering analysis on a set of 10 words. The results were very encouraging. However, this task required a lot of computational time. Therefore, we will try to rewrite the Matlab code we used for this study in C++, hoping for an improvement of the performances, before applying the methodology to a bigger dataset.

If the performances increase as expected, then we will try to validate our methodology by testing its accuracy on a training set, before applying it to a very large database of ornaments.

SIMEONI Matthieu

#### References

[1] S. Belongie,J. malik, J. Puzicha, Shape Matching an Object Recognition using Shape Contexts, IEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, April 2002