A Facebook approach for Data – Venetian Elite

During the past years, social networks have become popular across the web, thanks to the wide gamma of benefits they provide.  In order to better organize and display its information, Facebook, the most popular of them makes use of different structures. Some  examples of these are: TheWall, News Feed, Timeline, Notifications, Messages, among others [1]. Not only the previous structures generate value for its users, but also represent new ways to categorize and visualize  information  that could be applied to other collections of data, with the aim of offering to the user a different point of view.

As a part of the Venice Atlas and the Digital Humanities course at the École Polytechnique Féderale de Lausanne, during the winter semester of 2013 we were given the task of producing a sort of Facebook to represent the relationships among the Venetian Elite.  The purpose of this last post is to describe the process carried out for the project’s implementation.  It is important to remark that even though the outcome of the project changed  due to  data constraints, we tried to stick as most as possible to the original plan. Similarly, we tried to implement each task in the most automated fashion possible. Some key steps of the project  can be observed in the following graphic:

key_decision
Key aspects of the Facebook of the Venetian Elite Project

Data Gathering

The project plan that we had presented at the end of the first semester states that the first phase of our project is information extraction and we start by mining DBpedia.org  [2] where we could get structured information from Wikipedia. Therefore, we started by querying Dbpedia to get information about people who might be related to Venice by birth, by death or some other ways that we could think of. At this point, we realized the existence of some categories in DBpedia data. One of those categories that we have come across was “People from Venice”, exactly the group of people that we were looking for. We could also find some subcategories such as families, doges, princes, mayors, patriarchs, merchants, composers, painters, architects, explorers and so on. This category “People from Venice” also includes entries of people who does not fall under any of these subcategories, though.

We decided that a good approach would be to automate as much as possible the whole extracting information process. In this way, we would be able to proceed more efficiently and the categories should appear more naturally. Using the categories regarding Venetian people, we were able to download the data regarding approximately 500 Venetians.

Data regarding these people includes names, date and place of birth/death, titles, abstracts written for their Wikipedia pages and other Wikipedia specific data such as subcategories they belong to. Among these subcategories we can find artists, writers, doges, and others, sometimes even as specific as Baroque opera composers. Also, for some of the painters, we have data labelled as “influenced by” specifying the painters by whom this specific painter is influenced. On the other hand, we also got some data that we do not need, such as Wikipedia redirects, Wikipedia page ids; and data that we already know such as the subject of the data being a person, them being from Venice and so on.

Data Analysis

We have coded a program in Python to be able to keep or drop the attributes of those people. Some of them were stored automatically and the decision for the rest was made manually. As some people were out of context, the program also included a command to drop all the content of them.

grouping-code
Program in Python

To get the best out of the data we have, we decided that the most important features that we could use were groups and the lifespans of the people. Therefore, we decided to continue our project in that direction. We focused on how these groups evolved by time. Thereby, we wanted to create a timeline where one can choose the era of a specific Doge to see the trends of his time and those who lived under his reign.

Thanks to our little Python program, we discovered that the greatest category we have is “Artists” since it contains a large number of people. In fact, more than half of our sample is in that category. However, we have seen that 83% of the artist that we had was “Painters” and the people belonging to the other groups were rare. Therefore, with the guidance of our supervisor, we decided to focus on painters to be able to provide a tool which is appropriate for research.

The visualization

After having analysed the quantities and attributes of the categories, we had to decide how to present the data in a way that would useful and practical.  For the data presentation we had both to look for some tools and decide among the different possibilities. To choose  the proper tool for data visualization we considered not only how the data was presented and distributed across the screen but also its usability.  We found 3 tools that appeared to comply our requirements:  d3js.org [3],  Beedocs [4]  and  Timeline JS [5]. Even though  Beedocs  appeared to be the best due to the tremendous timelines that we could have built there, we have to quickly dismiss it since the software is paid and our purpose is to leave the project open to add improvements. Between d3js.org and Timeline JS  we chose the later because the first one would not have many options, and the ones available were very constrained in almost every aspect.

Different options for Timeline
Different options for Timeline

When the tool was chosen,  we picked the “Dodges” category to make a timeline example, so that we could familiarize with the tool  and visualize better how the outcome would be. As previously mentioned our final Timeline will be focused on “Painters” since  this category is very important in Venetian history but also this one is very  large and there are important differences between this and the rest .

For clarity and better visuals we decided to present the timeline with 6 rows per page. We planned to  present it in a sinusoidal way, but at the moment of implement it we found overlapping issues, therefore we decided to present the entities in a modular way. The  result can be observed in the following figures:

Example of Giovanni Bellini in Timeline

timeline - title
Timeline of Painters from Venice

Conclusion

It is very interesting to work with big amounts of data since one has to decide among all the possible approaches, which one to take in order to start to normalize the data according to these new restrictions, but paying attention not to lose important data during this process.

Facebook is a wide social network, therefore to try  to imitate all of its structures and features for this project was not feasible, since data  had several constraints. The main limitation of the data was its irregularity, meaning that we could easily find a lot of attributes for some entities and just a couple of them for others or that the format would vary along.  Therefore we decided to categorize the data and to analyse in depth which would be more relevant. Instead of trying to imitate a very limited Facebook for our project,  we  decided that a timeline would be perfect for its visualization, since the main attributes we had were: names and dates. Moreover, we decided to keep only the category of the Painters due to its size and relevance in Venice’s history.

In conclusion, to decide how to present data from a big dataset is a difficult task due to differences in quantities and format. Decisions have to be taken carefully  depending on which data you decide to present, and how to present it, so that one can take the most out of it.

Link to the project

References:

[1] http://en.wikipedia.org/wiki/Facebook_features#Facebook_structure

[2] http://dbpedia.org/About

[3] http://d3js.org/

[4] http://www.beedocs.com/

[5]  http://timeline.knightlab.com/