This work takes inspiration from a common situation. Imagine you are walking in a corridor of a museum, and you get stuck in front of a painting, thinking that you have already seen it somewhere. However, as human beings, we cannot remember all the paintings we have seen in our life. It is here that modern tools could come into play and help us.
Bilderatlas is a collection of tables, made by Aby Warburg, based on fourteen different themes. Warburg was a German art historian who, in the last two years of his life, started working on this atlas. His work, however, remained incomplete .
For each table, he pinned on a wooden panel several pictures and paintings that share a common theme. Even with a vast knowledge on the subject, the amount of artworks he considered is very small compared to today’s available databases of images. So, by implementing his wide knowledge with computing power, what could have Warburg done using modern tools?
The aim of our project is to find an answer to this question. We try to create a sort of continuation to his work and to explore patterns in the tables with the help of today’s technologies.
We chose some tables from the Bilderatlas in which images have strong visual similarities, and some where a pattern is not markedly evident. The four chosen tables are here reported, with a brief explanation on why we selected them:
- Table 2. It is rather a conceptual table, with little or no visual similarity among the paintings.
- Table 45. This table contains several paintings of buildings that have a similar internal architecture, with scenes in the foreground. The theme of the table sees dynamic, violent scenes, in contrast with calm, static scenes.
- Table 46. The theme of the Nymph can clearly be observed by the human eye in several of these paintings, but it is interesting to see whether the CNN is able to recognize the figure starting from the woman in the La nascita di San Giovanni Battista by Domenico Ghirlandaio (top, right in the image below).
- Table 25. It is a strictly visual table with stone and pillar reliefs.
In some tables there is no apparent pattern connection among images, and visual patterns are difficult to discern. As a consequence, it is more difficult to relate pictures in the same table to their theme by using computational methods based on visual features. In any case, we are not interested in being able to find the same images that Warburg connected in a certain table; our aim is rather to continue his work on a large database of images.
Since we are dealing with pictures connected by patterns that are not strictly visual, we need to use a research method based on a high-level representation of the images. This means that the pictures must not be read just as a collection of pixels, but rather in a more informative way. Therefore, we base our analysis on deep learning techniques. In particular, Convolutional Neural Networks (CNNs) seem to be the most appropriate choice in approaching this task .
A neural network is a model that tries to emulate the human brain and its way of processing. In the image-processing context, the algorithm takes as input an image and extracts a feature array that characterizes it. The array will be a high-level representation of the image, which will thus allow to recognize patterns that are not strictly visual.
To launch the queries, we use DH Replica, a web server developed at the DH Lab that allows to perform CNN analysis on a database of more than 40,000 images (still a small database compared to today’s possibilities). Thanks to DH Replica, we can easily select multiple images in the database and launch a query, visualizing directly the result (Figure 1).
In the DH Replica, the images can be set to positive, which means that the pattern we are looking for is present in the image, or to negative, which obviously implies that the pattern is not present. However, at the time of our project, the research with negative images did not produce reasonable results. We thus performed queries with positive images only.
As we will see below, the strength of this research comes from the computation of a common pattern when launching multiple images; the CNNs are often able to extract the pattern and find it in the database. For instance, when launching a single image, the desired pattern can be strengthened by performing a new query together with an image in the results that shows the pattern we are searching for.
We have also written a bot whose role is to launch queries. It can accomplish the same tasks as the DH Replica. However, since the latter considerably simplifies the visualization of the results, the purpose of this bot is mainly to receive the scores for the query, together with the annotations relative to the images, thus allowing a deeper analysis.
As far as visual similarities are concerned, the CNN algorithm used by DH Replica works well. We also had some interesting results concerning more complex patterns. We present now a few of the queries that we have performed, focusing on the more surprising results.
Table 25: Figures in movement and frame pattern
Among the different themes represented in table 25, we have the Muses. It is interesting to note that this theme is also present in table 2. However, here, they are part of the more global pattern representing figures in movement.
The CNNs were able to find the pattern of the figures in motion (Figure 2). In fact, if we query Apollon, and Angels Playing a Lute and Tambourine, we obtain different sculptures showing people in non-static positions.
This is a very impressive result for the CNN algorithm. It could capture the transitory movements of hair and garments present in these reliefs, which is a non-trivial pattern that is investigated in today’s research studies.
If we only query Apollon, by Agostino Di Duccio, which is a relief representing him between two columns, we obtain as results other sculptures contained in a frame. Not only do we have figures between two columns, but also other statues contained in square boxes, in some cases topped with an arch. This means that the CNN algorithm, given this relief, extrapolated the frame feature (Figure 3).
By adding The Moon, another relief by Agostino Di Duccio, we can confine our research to the column pattern, another main visual theme in this table (Figure 4).
In particular, among the returned artworks, we notice St Peter of Michelangelo, which contains the same column style as in Apollon and The Moon. This is a very interesting outcome since it shows how a given pattern might be repeated in different centuries (fifteenth and sixteenth centuries, respectively).
This table also contains some pictures of temples with arches in their façades (in particular, the Malatestian Temple, shown in figure 5), representing the theme of power, which is recurrent in the entire Bilderatlas collection.
The power of pattern extraction from multiple images
In Figure 6 we report the images from table 2 that we have used for our queries.
An interesting query is that of miniatures in table 2. By searching for the four miniatures (7a, 7c, 7d, 7f), the CNNs find several other miniatures, although inserting (quite low in the rankings) some paintings that are not actually miniatures (Figure 7). It is interesting to point out how the CNN recognized the drawing on paper, which is the only common feature among the found miniatures. However, when launching one single miniature, only one of the other four is found. This is due to the importance of the background, as the background of 7c is similar to 7d’s, while 7a’s is similar to 7f’s. Not very many miniatures are found by launching one single miniature, meaning that the CNN needs to compute a common feature (obtained only through multiple images) to find other miniatures.
From table 46, if we launch the miniature Book of Hours of Étienne Chevalier: Birth of John Baptist by Fouquet (Figure 8) we also get a few miniatures. However, their style is totally different from the miniatures found in table 2. In this query we see the dominance of blue and red colors, which appear in the first results. We have a taste of how important the colors are when the CNNs compute the characterization vector of the images.
An additional example of strong visual pattern is that of Madonna with the Child, by Fra Filippo Lippi (Figure 9). When launching this query, we astonishingly get a series of results with women with children on their laps, in several positions (Figure 9). We observe that the sixth score reported an image that has nothing to do with the pattern we are searching for. To get rid of this result, we just need to strengthen the pattern by adding another image to the initial query, for example Madonna with the Child and two angels and the other detail from Madonna with The Child of Filippo Lippi (Figure 9). In this case, the undesired painting has a much lower score (we find it at the twentieth position). It is interesting to notice that in the first paintings with closest scores, the child arms are raised, just like in the queried painting (in lower scores we do not necessarily see this feature).
The pattern of the raised arms is found also in table 2 (Figure 6). When querying the Andromeda (7a), incredibly the Farnese Atlas (6a) shows up, even though it has no visual similarity. The only thing that can explain its finding is the fact that his arms are spread in a similar way as the Andromeda. By adding the Cepheus (7f), Perseus (7c), and Farnese Atlas (6a) to the query, the theme of open arms becomes evident. This means that the CNNs were able to recognize and extract the only common pattern that is visible among the four images: the path of open arms (Figure 10).
It is surprising to see that the theme of open arms was recognised in such a variety of images: we have paintings, statues, and drawings. Here we have another example of the power of feature extraction from multiple images. The user can really guide the CNNs to the desired pattern, by querying at the same time multiple images associated to the desired pattern.
Other visual patterns
When launching The Presentation of the Virgin in the Temple (Figure 11), the CNNs recognize the arch and report a series of images containing arches. What is surprising is that it even recognizes arches that are not parallel to the observer, recognizing the perspective of the arch (such as those in figure 11). In addition, we see the recurrence of the architectural feature of the arch, which is a common pattern in lots of images throughout Warburg’s Bilderatlas.
In table 46 we find the Giovanna degli Albizzi Tornabuoni Medallion, by Niccolò Fiorentino: in our database we have both the single medallion with Tornabuoni’s face and the photo with both the sides of the medallion. When launching the single medallion with Tornabuoni’s face, we find only medallions in the first 40 results (Figure 12). Only in the low scores with find images with double medallions, included the same medal. This indicates that the CNN pattern does not focus on the single details, so it does not see that the woman figure is the same.
In Figure 12 we see that the first results are all very similar to the medal because they come from authors contemporary to Niccolò Spinelli, who operated around 1450, such as Pietro Da Fano and Matteo de’ Pasti, whose medallion is extremely similar, but turned the opposite way with respect to Spinelli’s. It is interesting that the algorithm reported this medallion as the most similar one, even if the represented figure is flipped. We find also medals from other epochs, which share a visible common pattern: a head enclosed in a medallion, where at the borders we find some texts.
When launching the double medallion of Tornabuoni, we get only double medallions (Figure 13). Here, the found medallions with similarities with respect to Spinelli’s medallion are present in a larger number, probably because in the database most of the medallions are double. The first result is a medal from Niccolò Spinelli himself, and the next five results are medallions from Pisanello, his contemporary. The styles are extremely similar.
Table 45: Static vs dynamic scenes
In this table we find the most interesting results. The theme of the table can be explained by simply putting together the two central paintings: Appearance of the Angel to Zacharia (Figure 14), and Slaughter of the Innocents (Figure 15), both by Ghirlandaio. They are similar, but at the same time with a basic difference: both show the same architectonic element (the arch) in the background, and a scene in front; however, while one scene is static and calm, the other is dynamic and agitated. One indeed represents purity, while the other represents a war scene.
One might imagine that the architectural figure of the arch would be the predominant feature, and thus the results of the two queries would be extremely similar. Surprisingly, the CNNs are able to distinguish the difference between the two types of scenes! While for one research we find dynamic scenes, mostly violent, for the other we find very static calm scenes, mostly religious. It is remarkable to see that CNNs recognized these two differences. This is a proof that the characterization array computed from the images really describes the painting as a whole.
By launching Appearance of the Angel to Zacharia, in the first ten results we find both Herod’s Banquet (also present in table 45). Therefore, we strengthen the pattern with Herod’s banquet, and launch a new query. The pattern becomes clear: an arch in the background that overlooks at a scene with several (mostly static) figures (Figure 14). Interestingly, there are also several paintings containing a static scene with only a few figures (mostly two figures), probably due to the architectonic element in the background.
We find several paintings from Fra Filippo Lippi and Ghirlandaio. Those from Ghirlandaio are all from Santa Maria Novella in Florence and thus present some clear visual similarities: same style both for the architectonic elements and figures.
When launching the Slaughter of the Innocents, the results show a great number of violent scenes in the foreground, with an arch in the background. Only a small amount of static scenes is reported, although with lower scores. We find the Slaughter of the Innocents, by Matteo Di Giovanni, which is also present in table 45.
In table 45 we see the feature of the arch in most of the images. We thus see the recurrence of the theme of the power across the Bilderatlas.
Table 46: The fruit-bearing nymph
In table 46, the most representative image is a detail from The Birth of St John the Baptist, by Ghirlandaio. We search for this theme in the database.
We lunched it together with a detail from the Three Temptations of Christ of Botticelli, since the the two women carrying flowers resemble to each other. It is interesting to observe that a few paintings that have very similar figures come out (Figure 16). These paintings are not in the table, so they can nicely represent a continuation: Moses’s Journey into Egypt and the Circumcision of His Son Eliezer, by Pietro Perugino, and Madonna with the Child and Scenes from the Life of St Anne, by Fra Filippo Lippi. The woman represented in the latter painting is extremely similar. We find that Fra Filippo operated some years before Ghirlandaio, so the latter was probably inspired by the former for the figure of the nymph, although with a different interpretation. In the row just below the reported scores, we also find Michelangelo’s Judith and Holofernes (Figure 17).
In this query, the nymph-like woman in Three temptations of Christ was in the middle of a crowd. We thus get many paintings with agglomerates of figures in movement where the dark-light of the clothes is present on many characters (for example, St John the Evangelist Resuscitating Drusiana, by Filippo Lippi, in Figure 17). However, if we launch only the details with the single women (Figure 18), we find paintings like The Beguiling of Merlin, by Edward Burne-Jones. It is interesting to notice that this painting was reported despite the totally different style. Indeed, the figure is clearly nymph-like. Also Michelangelo’s Judith and Holofernes is present in the results of this query. In the lower scores, together with Michelangelo’s Judith and Holofernes, we found Botticelli’s The trial and calling of Moses (Figure 19). We noticed that the Allegory of August, by Cosmé Tura (Figure 19) was reported. This fresco belonged to the pattern of open arms, seen in table 2, and it can thus represent a link between table 2 and 46.
Also in this query we can see that the light-dark contrast of the clothes has a major importance when launching the single nymph. Several statues are reported due to this, and they are not necessarily in movement.
Overall, we can say that the CNNs is probably not the best solution to find single details like this in a large image, since the computed feature array describes the painting as a whole. We cannot launch the whole painting because the pattern will not describe the single woman. CNNs can however be useful to find a figure in the cases where it covers most of the painting, because the computed feature would describe the woman. Therefore, to find the single nymph, one has to extract the detail from the painting and launch it.
In table 46 we find two other nymph-like figures (Figure 20). They are two drawings, with women carrying vases. Despite the strong similarities with the nymph in The Birth of St John the Baptist, no other nymph-like figures were reported querying the two drawings (both together and separately). This is probably because paintings and drawings are too different from the CNN standpoint. The background has indeed a strong influence when querying an image, and we have a proof when we query one drawing at a time. We mainly find other drawings whose background is similar. The other Woman carrying vase is reported, but with a considerable lower score.
We saw such a strong influence of the background also when querying the miniatures from table 2.
To conclude, the strengths of this algorithm originate from the extraction of a low-level characterization of the images. The possibility of constructing a common feature by launching more images at the same time allows the user to guide the algorithm towards the desired pattern.
The algorithm proved to be extremely successful as far as strong visual patterns are concerned. In some cases, images with extremely similar styles were reported, and the artists were found indeed to have operated in the same epoch.
Some surprising results were achieved with the pattern of the open arms in table 2, and the two contrasting paintings in table 45. Several different patterns were explored, and the CNN found a high number of other images that could fit well in Warburg’s tables. The algorithm was also able to find connections between different tables, such as in the case of Tura’s Allegory of August.
With the help of “negative” images, this algorithm can become even more powerful.
NB: All the work we have carried out (including all the scripts) is open source. It is freely available at https://github.com/e-bug/bilderatlas.
 Isabella di Lenardo, Benoit Seguin, Frédéric Kaplan. Visual Patterns Discovery in Large Databases of Paintings.
 L’Atlas Mnémosyne, Aby Warburg, 2012, L’écarquillé.