In the first post we left at the very end of the digitization process, when all the required informations from the csv file were to be cleaned and fixed. This process has been perfomed, as previously stated, partially with a manual comparison of the OCR results and the input images, partially with automated scripts written in Python that aimed at cleaning values, fixing names, dates and other simple tasks.
The step that followed was the analysis of the dataset, divisible in two subgroups: on one side a simple analysis of consistency of the database, considering values and their validity; on the other side a deeper statistical analysis on the amount of data, from the distribution to trend lines etc. We can consider this part of the project as connected with the digitization process (once again) since it has been very helpful to clean up and fix our data even more.
On top of that we left the syntax level to proceed on semantic and the meaning of records, recognizing more and less important records that will help in subsequent steps.
Analysis on data distribution over the years
One of the first interesting things we wanted to dive into was the distribution of the dataset over the years, since the amount of data is of about 1500 records spanning from year 1262 to 1443 it was important to understand if we could reduce the dataset or recognize any kind of behavior.
From the image in fact it is possible to see that while the amount of records per year is always very small due to the amount of data that we have, we can see some sort of average growth in the years.
What might be interesting also would be to understand if the spikes were related to some specific events in the history of the Venice or if there is no reason for that, but again this is a more historical type of research.
Analysis on Route/Destinations
Together with the growth of the economical power the Republic of Venice extended it is territory and reached farther and farther European markets. In the timespan illustrated in the database, spanning over ca. 150 years, the changes in routes are evident.
Continuing from the previous analysis another example concernes routes and destinations.
Most of the registered information present specific set of mandatory stops that the ships had to do in their journeys from Venice. It is not clear if these stops were related to commercial reasons, political reasons or if there were no better paths from Venice to these other destinations (considering for example sea currents and wind).
The analysis will then help in the recognition of patterns and changes while the books will a manual research will then be required in order to understand what lead the routes and/or destinations to change (even in frequency or costs).
While analyzing the available data the biggest problem that arose was the lack of connections with an historical background that explains how.
A small research with the help of an italian history student and the “Dizionario Biografico degli Italiani” in the Treccani encyclopedia brought us to the knowledge for example that the cost of traveling specified in the documents lead to the idea that we are talking about commercial trips to exchange goods rather than public expeditions, more often related to military and politics.
The next steps will keep considering analysis and associations of the results with historical background to explain behavior.
What also will be considered will be an analysis and a comparison of the set of owners of ships (from surname only) with the nobility of Venice, trying to recognize relationships and deals between owners over the same trips.
Of course there is still a lot of work that must be done to better understand the data and to point at the right direction for this project and this will keep being a consistent part of the project until the very end.