Our work aims to provide a tool to retrieve entire copies of a query painting. This tool was developed with Matlab and is based on HOG features. We also give a comparison of its performance when changing parameters.
When you run into a painting on the internet or in a document, how do you know if it is the original or a copy? How do you even know if there is any copy of this artwork? Art historians usually try to contextualize the painting and look for information on other artworks. Therefore, a tool able to automatically link a painting to other ones would be very useful. They could for example know if there exists identical or partial copies of this work, or if there are other paintings with common features or composition, revealing influences and artistic trends. Currently, this recognition can be done only by an expert assessment of specialists; that’s where an automatic recognition using image processing would be of great help for art historians. Also with this automatic retrieval, a rich database could be created, making sure historians do not miss similar paintings they might not have thought of.
There exists various techniques that can be used to approach an image retrieval problem. We worked on some of them in order to discard the less efficient and keep only the most performant.
Image retrieval in photographs is nowadays very common (Google, TinEye, …) and efficient algorithms have already been implemented. The main problem is that most of them do not work very well on paintings. For example, when you look at a photograph, the regions with no texture (for example the sky, the skin, …) are homogeneous. On the contrary, in a painting, these regions may show small traces of brush that we, as humans, are able to discard to see the whole picture but that the algorithm will interpret as meaningful information. Also imagine your camera’s face detection algorithm trying to detect The Weeping Woman ! These are some of the considerations that led us to deal only with paintings from the Renaissance since these are the ones that are the most similar to real pictures.
Another important point to consider is the relevance of the objects we look at in a painting. Generally we first look at faces and people, and so these are the most important parts in an image. We will thus use this fact to retrieve copies where the characters are arranged in a similar way, even if the background changes.
The first technique we explored was inspired by the Kinect’s system to detect people and is based on skeletons. The main idea is to decompose the human body into several key-points (for example head, hands, articulations…), forming a ‘skeleton’. The Kinect constructs this skeleton using the image and the depth map, and is then able to estimate the position of each member and reconstruct the shape of the person in front of the camera.
In our case we didn’t have the depth maps of the paintings (which should be acquired directly from a 3D scene) so we couldn’t automatically generate the skeletons. This would have led us to manually annotate every painting which would have been time consuming and not convenient for the user. Moreover, the cases where characters were partially occluded were delicate to handle. So in order to have a more automatic tool, we decided to focus more on the field of image processing.
3.2 Object Segmentation
In a second time, our idea was to use object segmentation to manage to isolate objects of interest (mainly people and faces) from the background, and then use different metrics to compare them with the characters extracted from other paintings. To do this we had to first detect where the regions of interest (ROI) were and then extract them. We tried face detection algorithm but since most of the characters were not in full-face position and the illumination was never ideal, we had poor results. We then tried pedestrian detection algorithms and had better results, but only for standing characters. To overcome this problem, we decided to add the possibility to select manually ROI not detected. With these ROI, we used a standard image segmentation technique called active contours. The idea of this technique is to have a rough idea of the contour (the rectangle that defines the ROI), and by adding some constraints, the algorithm will make the contour evolve until it fits the real contour of the object.
The problem with this approach was that detection and extraction depended on too many parameters that were different for each painting. Despite the encouraging results, this variability in the parameters prevented this method to be generic because we would have to tune them for each image.
3.3 HOG Features
Since we had quite good results with the pedestrian detection algorithm, we decided to look a bit deeper in this algorithm and found out that it uses HOG features. The HOG (Histograms of Oriented Gradients) are good global descriptors of an image. They represent the distribution of intensity gradients or edge’s directions in an image. The image is divided in cells and the HOG are computed for all the pixels contained in a cell for each cell. The choice of the cell size defines the wanted precision for the descriptor (coarse description of the shape or fine details). To improve the results and be more robust with respect to illumination variations or shadows, it is possible to normalize the local histograms with respect to the surrounding intensity by computing the intensity in larger regions, called blocks, and then normalizing all the cells contained in a block by this value.
Each HOG feature is a vector which size depends on the image and cell size. In our case, we described one image by one HOG descriptor, thus we ended up with one feature vector per image. To compare the feature vector from a query image to all the ones in a given database, we first normalized every feature vector by subtracting their mean. Then we used a score function that computes the scalar product between two feature vectors and we chose the higher scores.
This method had good results on a few test paintings, but to confirm its performance, we tested it with different parameters on databases of different sizes. The parameters were the cell size and the size of the image on which we calculated the HOG features (which should be a multiple of the cell size). Concerning the databases, we had around one hundred of entire known copies that we had to retrieve in a ‘noisy’ size varying set of paintings. The cell we used had the standard sizes of 8×8 pixels and 16×16 pixels, and we used four sizes of images: 128×128, 256×256, 384×384 and 512×512. The combinations of cell size and image size change the level of details captured by the HOG features and so, its information about the global shape of the paintings. For the databases, the ‘noisy’ sets are sets of paintings from the Web Gallery of Art that are not copies and are here to disturb the retrieval of copies and simulate big databases in which we would like to retrieve a given copy. We used ‘noisy’ sets of 100, 1000 and 2000 paintings to see the stability of our descriptors over the size of the database.
We tested the effects of these parameters on the number of copies retrieved. We considered the 10 first best matches and we counted how many copies appeared in these first 10 images. The figures below represent the percentage of copies retrieved (for example 4 copies out of 8 appearing in the 10 first matches would mean 50%). The results are shown below.
- The above image is obtained by summing all the individual performances for different original paintings (9 in total).
- We can observe two things :
- The combination of cell size and image size that gives the best result is the 128×128 with a cell of 8×8 pixels.
- In general, the performance diminishes with the increasing size of the image.
This can be interpreted in the following way: a too big image size will give many details, but we will lose information about general shape whereas with a smaller image we are more able to describe general patterns but less details. We see for example that a cell size of 16×16 performs badly for a 128×128 image because the cell is too big and includes too much of the image to be significant, but that the same cell size for the 512×512 image is slightly better than the 8×8 since it has a more general description. We therefore need to have a compromise between the quantity of details and the global shape which is found with the 128×128 image and cell size 8×8.
- Globally we see that the performance diminishes with the increase of the database’s size but the results are still good (around 85% for the best combination). This shows that the HOG descriptors are quite robust and stable with respect to the dimension of the database.
These are the global performances but they may vary depending on the content of the painting. Below are two extreme cases.
Retrieved images for a database of 2000 images, cell size of 8×8 and image size of 128×128
- For this image we see that the best image size seems to be 128×128 pixels and that this performance is quite stable through different database’s size. We also see a drop in the performance when the image becomes too big, which confirms the decrease we noted in the general case.
- For a database of 100 ‘noisy’ paintings, the size doesn’t seem to be an important parameter but the cell size has a bigger influence. We see however that for bigger databases, it’s the opposite: the size of the image has more influence than the cell size.
Retrieved images for a database of 2000 images, cell size of 8×8 and image size of 128×128
- On this image, we see that the HOG descriptors are very stable, we retrieve all the copies every time except for the combination 128×128 / 16×16, which reveals the same problem as discussed in the general case : the cell size is too big compared to the image size.
- There is no performance decrease over the database sizes for this painting.
With these two extreme cases, we see that the performance still depends on the content of the paintings. This is the limit of the global descriptors for which it’s difficult to discriminate paintings with similar compositions (but not copies). However, this can be used to find common points between different paintings and reveals trends or inspirations. Indeed, during our trials, the algorithm sometimes gave us different paintings from the same author or the same scene represented a bit differently (for example the parodies of La Gioconda were find when querying the original one).
We were able to implement an algorithm that retrieves copies of paintings most of the time for most of the query paintings using the optimal parameters we managed to find. Moreover, we saw that it not only retrieves copies in a database but can also reveal influences and artistic trends.
A further work could increase the number of returned copies by considering the cases where the image is mirrored or retrieve partial copies (only one or two characters copied for example). For the partial copy retrieval we would suggest to divide the original image in a convenient number of patches and choose their dimensions so that they cover the entire image. These patches would be overlapped to be sure to describe each region of the image.
But the above improvements would require more computational resources to be fulfilled. Indeed, due to our limited resources (laptop) it was very time-consuming to test the retrieval of partial copies within a significant ‘noisy’ database because computing HOG features for all possible patches needed much more calculation. Also it would be necessary to be able to handle a much bigger database (several thousands of paintings at least). And even if we saw that reducing the size of the image (and therefore reducing the time of HOG computation and memory storage) gave better results, calculating all the HOG descriptors and handling them will always be the limiting factor for database’s size.
5. Matlab code
We will provide our Matlab code for the algorithm and the graphical interface after the demo of the 27th of May.