Inference system based on Venetian extraction data – Third progress report

The aim of this post is to present some of the inference results found using the Shiny App on the Garzoni data set [1]. The web application can pass SPARQL queries to the SPARQL endpoint of the Garzoni data and output the results in a data frame, which can then be analysed in R. More details on the app can be found in our previous blogpost [2].

It is very important to ensure the consistency of the data set before moving on to more complex analysis. Even a few faulty entries can greatly influence the results of test statistics and lead to incorrect hypothesis and conclusions.

Hereafter, we present a couple of inference examples, where the aim is to check the logical consistency of the data in the Garzoni data set. This is done by selecting the RDF entries that verify the given properties and checking if their inclusion and intersection relations follow the expected logic.

All business relations should be acquaintances

We select all acquaintances between 1604, as well as all the business relations in the same time period. We expect to find that all business relations are acquaintances. A summary of the results table head as given by R is displayed below.

Summary of a table head.
Summary of a table head.

Indeed, there are no inconsistencies between these two rules in the Garzoni data set.

All relations of a person should be acquaintances

Here, we select all acquaintances between 1604, as well as all the relations of a person in the same time period. We expect to find that all the relations are acquaintances.

The SPARQL query in the Shiny app.
The SPARQL query in the Shiny app.

Indeed, there are no inconsistencies between these two rules in the Garzoni data set.

Further developments

The previous examples correspond to queries where the results give the expected outcome. Whenever some possible logical inconsistencies are detected, we output the results in a file and then the RDF entries can be verified and if necessary modified by hand. The computational speed of the analysis has greatly increased compared to the initial idea of working directly with the inferential tool of the RDF graphs, thus justifying our new approach.

During the following weeks, we will continue to test logical queries and proceed to analysing more complex relations. The aim is to come up with queries that test new relationships and to find links between the existing connections. The exact directions of our analysis will be guided by the results found at each step.

References

[1] Shiny Apps RStudio http://shiny.rstudio.com/

[2] Inference system based on Venetian extraction data – Second progress report http://veniceatlas.epfl.ch/inference-system-based-on-venetian-extraction-data-second-progress-report/

Authors: Lavinia Ghita, Loris Michel