Garzoni font dataset’s profession profiling and descriptive statistics – Project progress report II

Previous progress report: http://veniceatlas.epfl.ch/garzoni-font-datasets-profession-profiling-and-descriptive-statistics-project-progress-report-i/

During last three weeks we were working on discovering interesting correlations. Also we started  building a web-site where all our results could be visible in an interactive way.

The problems we have encountered were the syntax and specifics of the R language – we are learning it with this project and it took us a little bit more time that it was supposed to  in order to understand how do more complex things work in it, not just histograms and map plots.

Right now we made 3 steps in our project:

I. Choose the most meaningful years to work with

As you could remember from our last progress blogpost, distribution of the enrollment year is way far from uniform one:

Enrolment Year

We decided to work with years that contain 80% of information of the entire dataset. Years that left after such a filtering are:

1582-1584, 1591, 1592, 1596-1598, 1620-1622, 1625, 1626, 1632, 1645, 1653, 1654, 1656, 1657, 1658, 1664

These years contain 80% of all information, independently of the feature we are interested in. We also implemented an easy function that will choose 80% years that contain 80% of information for the feature we are interested in.

II. Compute basic correlations

We were mostly interested in correlations between annual salary of apprentice and the following factors: an apprentice profession, a master profession, a length of apprenticeship and an apprentice age. All other features are not interesting for the correlation computation because they are binary or factorial.

Here are our findings for dependency of an annual salary for the entire dataset, independently of profession or enrollment year:

Salary vs apprentice profession correlation Salary vs master profession correlation Salary vs apprenticeship length correlation Salary vs apprenticeship age correlation
0.2032462 0.2090282 -0.6205411 0.438719

Results are not surprising

  • No correlation between profession and salary
  • Slight dependence of salary on apprenticeship age. Greater age means greater salary, but the correlation value is not important and can be considered as noise
  • Significant inverse proportionality on apprenticeship length: Apprentices that take more time to finish the apprenticeship are less paid

Also we were interested by same correlations, but in dependence on the apprentice profession. Because of a lot of noise in our data (misspelling in profession names, absence or corruption of a lot of data), correlation values could be retrieved only for some of professions.

Profession Salary vs apprentice length correlation Salary vs apprentice age correlation
marzer -0.665209 -0.03494544
marangon -0.9589266 0.3664476
spechier -0.8427758 0.207238
orese -0.8018631 0.5155677
murer -0.8879052 0.9425416
tagiapiera -0.9977928 0.8834522
cuori d’oro marangon 1
desegnador -1
stampador al torcollo -1
dai colori -0.7857143
depentor da casse -1
tiraoro -0.9576685 0.4282824

Values for professions “cuori d’oro marangon”, “desegnador”,  “stampador al torcollo” and “depentor da casse” are probably just noise – it is incredibly unlikely that on such a noisy data correlation would be equal to 1 or -1. Other values are going well with general correlation values shown before.

III. Construction of web-platform that visualize our findings

In parallel with the work described previously, we started to construct a web-site that will visualize our results. For the moment it is in a very early development phase, so there is no URL to provide, but we can say some words about technologies that are used behind.

On the back-end we use io.js as server with express.js as framework. The clean data is provided to the client where it is filtered and shown using chart.js for plots. The website will be hosted on the Heroku platform with the following url: garzoni.herokuapp.com. For the moment there is nothing there, so please do not be surprised with “No such app” error.

Future work

We will continue to work on correlations and on the website for visualizations.

To finish this progress report, we would like to show you small GIF animation of an apprentices origins evolution per year:

Apprentice origins animation over years