In this blogpost, we will summarize our progress in gathering paintings data since the last blogpost. Along with that will be some changes in the scope of the project and the schedule of the project in the future.
As indicated in the previous blogpost, our project calendar has been affected by the difficulties in the data gathering process, namely, where to look for the data and how to collect those data. Since the first blogpost we have been focusing on these matters and after looking deeper into the “Getty Provenance Index”, we’ve found out a few things about this source.
Firstly, the source, though seems immense, will only serve a limited number of collections based on the specific purposes, those collections are:
1) The works of art from private collections in France, Italy, the Netherlands, and Spain (without provenance),
2) The works of art from auction catalogs that were sold in Belgium, France, Germany, Great Britain, the Netherlands, and Scandinavia (also without provenance),
3) The description and provenance of paintings from American and British institutions by artists born before 1900.
Due to the fact that our project is focusing on the flow of paintings, we have decided to choose number 3 as the main source. This means that we will have to limit ourselves to the paintings that are right now situated in American and British institutions. However, if we can find out a way to somehow link those data to Venice or Italy in any way possible, this data can still become a good database for our project.
The search machine of the website on this matter is also limited, a snapshot of it is attached below.
Searching by artist name only is not a good idea because the database will be sized down too much. Hence, we decided to search for all paintings whose artist is from Italy. This returns around 23,000 results. However, the number of results is not the same as the number of paintings because each result in here correspond to one location that a painting had stayed in its history. For instance, if a paintings stayed in 4 different locations before it reached its final destination in the US, then the number of results for that painting alone will be 5. Below is an example of the full data of a painting that we obtained (short version).
Right now we can only downloaded 10,000 results due to the rules of the website but that is already enough for us to work with (based on our observation, a painting usually stayed in an average of 5 places in total, which would make the number of paintings that we can obtain be around 2000 paintings already). At first we will try to implement the project on this number of paintings keeping in mind that if the project works well then it will work for larger data as well.
As mentioned above, the spreadsheet is divided in to rows and columns, each row corresponds to the place that a specific painting had stayed in the past. Moreover, each painting has an ID number along with it. This will make it easier for us to create the database and use for instance, SQL to filter and group the data the way we want.
Considering the mapping and visualizing part of the project, we have decided to make it capable of showing not only data about a single painting but also the statistical data based on the path and flow of paintings in order to give the users the capability to identify the trend that they are interested in. The detailed method, procedures and platforms used will be researched and decided after this blogpost.
As stated above, the previous plan that was mentioned in the first blogpost was correctly followed. We have narrowed down the scope of our project based on the size and availability of the data (limited but sufficient). The work from now on is to create the database, try to access and implement a logical and justifiable query language on it and finally, implement everything on a map and visualize them in a convenient way for the users.