Previously on the Facebook of the Venetian Elite blog, we discussed the tools that we chose to work with, as well as some problems that arose while working on the project. In addition, we mentioned that Wikipedia and DBpedia are not very good at showing connections among individuals, therefore we needed to find a way to gather this type of information in an automated fashion.
Due to some restrictions that appeared on the way, we decided to change track so, in this second post we will present the new approach that we will follow, by mentioning a couple of features that would be included and we will conclude with the further steps to take.
Data gathering – quality and type
Thanks to DBpedia and to a small program that one of us developed in python, we were able to download the data regarding approximately 500 Venetians. This program took all the links from DBpedia related to one person and allowed the user whether to keep or drop a link. Some categories were stored automatically and for the rest the decision was made manually. As some people were out of context, the program also included a command to drop all the content of them.
Data regarding these people includes names, date and place of birth/death, titles, abstracts written for their wikipedia pages and other wikipedia specific data such as subcategories they belong to. Among these subcategories we can find artists, writers, doges, and others, sometimes even as specific as Baroque opera composers. Also, for some of the painters, we have data labeled as “influenced by” specifying the painters by whom this specific painter is influenced. On the other hand, there also exists data that we do not need like wikipedia redirects, wikipedia page ids; and data that we already know such as the subject of the data being a person, them being from Venice etc.
With our little program designed to gather data we discovered that our main category is one called artists for 2 main reasons. First it contains a large number of people, in fact more than half of our sample, since it is conformed by subcategories such as: writers, painters, sculptors among others. Second, the labels related to this category go beyond birth/death date or family relationships.
To take advantage of the former category we decided that the best features that we can get are: groups and the lifespans of the people. Therefore we have decided to continue our project in this direction. We will focus on how these groups evolved by time so, we want to create an interactive timeline where one can focus on a specific group to be able to see their progress or focus on the era of a specific doge to be able to see the trends of his time and those who lived under his reign. For example, the user will be able to see with interactive arrows the different connections among artists depending on the “influenced by” label, or filter people by groups depending on the “period of artistic style” label.
The following steps
At this point, the next step to take is to extract the groups from the data we have and represent it in a way which is more suitable for our goals. For this purpose, we will use Python again to build a program that will help us to identify the groups and change the representation of the data. We need to get the number of groups, as well as their sizes, in order to decide which groups to include in the application. Once the type of timeline we want to create is decided in full detail, we will move to the visualisation part of the interactive timeline.