Globally renowned as a city of urbanistical peculiarity and incredible artistic heritage, Venice has always been famous as one of the most powerful maritime republic during the Late Middle Ages, the renaissance and the baroque periods.
Largely focused on this maritime side, the project “Route Planner” concerns a very interesting and wide topic that develops into many different forms and many different directions.
This first weeks of work has been characterized by three interconnected and overlapping moments:
- the definition of a more concrete goal;
- the collection and consultation of material;
- the digitization process of ship information and dataset, from paper based to digital format.
Based on many aspects, from the number of person working on the project, to the available information to the data format and more, the definition of the project was the initial step to take. This to better state and clarify what is the idea of the result, what is the direction to take, not only in general terms but in more concrete examples.
While it’s important to have an idea on what to do and where to go, it’s not always simple to take a stand until the first informations are consulted, in our case this consists in the collection of material and the digitization process currently in process.
This is what characterizes this first period of work, collection and analysis. On one side more statistical oriented analysis over multiple dimensions on the dataset. On the other side a more analytical-historical research with the purpose of giving a meaning to the data and finding the correct interpretation and connections.
Working on large dataset is not an easy job and might take hours or even days if done manually. With the support of new technologies, automatizing and working on digital information has become a common practice as well as an efficient way of working.
The available informations on Venetian ships is wide and composed of large records of travels containing type of ship, owner, route, destination and many other information regarding the whole maritime traffic. Basically, a database on paper.
The digitization process taken followed this steps:
- conversion of images into text through OCR;
- fixing of OCR results;
- conversion of the correct textual informations in a more suited format, SQL database.
Far from being a precise and simple process, the OCR can be considered the most delicate and difficult part of the whole digitization. In fact the result can be either really good or bad.
The initial material was already given in textual format obtained after the conversion but as it was possible to notice, many errors appeared in the process.
After an analysis on the type of errors appeared in the conversion a Python script has been written to fix the most common ones and to reparse the whole document in a more coherent format.
Current state and future work
From a textual and (mostly) correct version of the database the second big part of the digitization process has been the creation of a database, also automatized with a python script to obtain a SQL statement for the definition of tables and the insertion of data.
More precisely a SQLite file containing the dataset given is the current state of the finalized digitization process.
What is to be expected next is the analysis of the data, again through automated Python scripts, over a small set of ideas in order to find a good direction to follow somehow supported by the information we have in our reach.