In this blogpost, we will summarize what we have accomplished for the first 3 weeks of the schedule and some changes of the plan according to the current situation.
As indicated in the project blogpost, the first 3 weeks have been used for gathering data and narrowing down the scope. The data here include the list of art museums in Venice, the map of those museums and finally, the list of paintings that have stayed in Venice and their provenance.
Due to the reason that it is not easy to objectively categorize which place will be considered an art museum and which is not, we searched for the list of museums through the webpage http://www.visitmuve.it/ which lists out 11 art museums around Venice. We also created a google custom map to show their locations graphically. Those museums will be the one under our focus although it is likely that due to the size of data, our scope will still be narrowed down in the near future. (Map link: https://www.google.com/maps/d/edit?mid=z97GIO_qBJy0.khETdmxyNadM )
Regarding the paintings data, we first try to look at online data through several sources. However, it turns out that this is one of the most challenging parts of our project. In short, the whole data about paintings are scattered all over the internet and it took a while for us to look into the most promising one. Our constraint is that the paintings should have at least stayed in Venice for a period of time. The most promising sites up to date are the “Fondazione Musei Civici Venezia” (http://www.archiviodellacomunicazione.it/ ) and the “Getty Provenance index (http://piprod.getty.edu/starweb/pi/servlet.starweb?path=pi/pi.web). From this step, we encountered two main challenges. Firstly, full information about a painting can only be available through collecting information from multiple sources. For instance, the first site has all the information regarding the paintings that are now in Venice except where these paintings had stayed in the past. The second source is better in terms of allowing users to search thoroughly with customizable filters. It also allows users to get a spreadsheet of their search results for further use. This source actually has the provenance of paintings but it is only as long as the paintings are located in the US at the moment. From that comes the second challenge that is to find a new source that is capable of giving us the provenance of paintings in Venice.
According to the situation, rather than focusing on the size of the data, our project needs to focus mostly on where to collect data and how to connect data from different sources in order to complete the information about the paintings. Due to the fact that the amount of data that needs to be sorted and looked through being rather large, we will look for ways to have it automated. However as we cannot have a direct interface to the databases, the mining technique would possibly need to parse through a web-page, making it more complex. As more challenges arise from the process of obtaining large data, a change of future schedule is unavoidable. As the method of gathering data is of most needed at the moment, we will spend the next 4 weeks to both create an efficient method and narrow down the scope according to the size and availability of data. After that, we will focus on implementing the data and applying them to the map.