Modelling the Circulation of Paintings


There are many levels in which a person can enjoy and appreciate a painting. Some people will just enjoy the beauty of the painting itself and walk away. Others might also be interested in the story behind the painting with common questions like: “Why was it painted?”, “What had inspired the artist?”. However, that is not the only story behind a painting. Although paintings are objects, they rarely stay in a single place their whole life. In fact, paintings might be travelling around the world more than one can imagine. Our project focuses on telling this hidden story not by mere text but by visualizing the flow of this particular type of artworks on an interactive web based map. This map will open possibilities for people to enjoy art on a new level. Through this map, one will be able to track back the geographical history of a single painting or even of a group of paintings that have things in common. This interactive map will also serve as a good visual tool for researchers looking for statistical data of paintings.

The Project

Data Gathering

The first and foremost thing to do is to gather the data for the project. It is obvious that it is extremely difficult at this stage to gather a massive amount of all the paintings in the world not only because of the size of the database but also because of the scarcity of a universal digitalized data. Static data for a painting (like artist, materials, date created) are easy to find. On the other hand, provenance data of paintings are not easily available. Therefore, we limit the size of our data based on both its availability and the ability to derive meaningful statistical results from it. This is a reasonable approach since once we have developed a concrete way of processing and visualizing the limited data we have, an increase in the size of the database cannot cause any drastic change in our codes and algorithm.

After carefully investigation on the available sources of paintings provenance, we concluded that the Getty Provenance Index (GPI) is the most suitable source for our project. Although the data is not fully organized, it is the largest digitalized data available and is also very complete in terms of the provenance of each painting. The resources of this site is the provenance of paintings created by artists born before 1900 held by public institutions in Great Britain and the United States. This will obvious means that the endpoint of the trace of the paintings we have will be in the UK or US. The problem with the source is that it does not allow users to download more than 10,000 records (each record equal one location of a painting) and its search machine is also very limited. After communicating with the website, we were able to get the full data from the source for all paintings that are painted by Italian artists. This data will be very meaningful for “The Venice Atlas” because most of the paintings from it are originated in Italy.

Data processing

Our objective here on was to change all the location data (string) into its correct geographical data (latitude and longitude). The data is first structured inside a PostgreSQL database. This allows us to perform some basic parsing. In order to parse the data, we first cut out every word from the location record using comma and dots as delimiter. One of the example of the location could be: “Thompson, Henry Yates. London, England, UK (bought at Denison sale)”. After words are separated, we cross check them with a list of cities already has their corresponding latitude and longitude. For the example mentioned above, the result will be London and its geographical coordinates. One problem with the data is that the information about the location is poorly formatted and no consistency is found. This required some cleaning of data before and even after the parsing process which has been done both technically and also by hand. At the end of this step, most mistakes have been ironed out and the data is ready to be used.

Due to the fact that what we want is the flow but not just the points we have also sorted out the data based on the dates. Furthermore, as we are going to have a lot of points in a city but we only have one coordinates for the city, instead of putting all points in one place, we have decided to disperse them in a matrix which will be shown in the next part.


As the organized databased is fully built and stored in Postgres database system, the next step is to visualize everything on a map with a web-based interface. In order to do this, we developed a script written in javascript featured with a GIS library called Leaflet. This allow us to present the data on an interactive web-based map. The data will be imported from Postgres using PHP and AJAX.

The form of the imported database is presented as an array, in which each sub-array contains the information of a painting. Inside each of those sub-array are smaller arrays, each of those represents a location of a painting (name of painting, name of artist, time period, name of place, latitude, longitude). By doing so, we can visualize two very important information. Firstly, we can then visualize all the stopping points of all the paintings on the interactive map as markers with pop-up containing painting information. The second thing we can show on the map are the traces of paintings. Traces are represented by lines connecting each marker of the same painting in order of time. The final presentation of this will be a map containing traces of all the paintings and a single mouse click at any marker will give the user the information about the stopping point of the corresponding painting.

fullmap_marker and line
Map with all markers and line visible
Points in same city are dispersed as matrix to avoid coincidence
Points in same city are dispersed as matrix to avoid coincidence

The total number of points we have is around 20,000 and naturally this will cause difficulty for Leaflet in showing the data. This problem is completely solved using the Marker Cluster built by Dave Leaver. This give us the capability to group multiple markers that are relatively close together as a single cluster showing the number of points it has grouped. By zooming inside this cluster, smaller cluster will appear, and by max-zooming in those smaller cluster, the markers can be accessed. By doing this the page can be loaded much easier and faster for example, with nearly 20,000 markers and around 4000 connecting lines, the map needed less than 4 seconds to load.

Marker Cluster make it easier to deal with large number of points
Marker Cluster make it easier to deal with large number of points

As the map is completely built, we also give user the capability to do querying. For example, ones can choose to show only the traces of paintings that have been in Rome at least once, or only for paintings painted by the same artist. This is where the PHP and AJAX come into handy by making a two-way bridge between javascript and Postgres. Along with this, if the users click on anywhere on the map, it will tell them the latitude and longitude of where they clicked as well.


We have successfully built an interactive map in which users can observe the flow of paintings based on their preferences. This not only opens a new possibility for ones to enjoy works of art but also makes it possible for researcher to access visualized statistical data of paintings. Although the data we have is limited due to the unavailability of a complete digitalized database, what important is that we have created an interactive map that is capable of handling database even if it’s much larger than the one we have. Based on this project, further improvements can still be developed including improving the precision of the querying logic and giving the user more ways to interact with the map.



Getty Provenance Index (

Leaflet JS Library (

Marker Cluster Plugin (