History of woodblock printing
The technique of woodblock printing appeared as early as 3rd century AD in China when ornaments and picture carved in wood were used to print on cloth and later paper. It made it’s way to Europe much later, exact date unknown, but probably around 1390s when paper production started in paper mills. interestingly enough the technique spread because of the underground playing cards printing, which was illegal at the time.
At first European woodblock prints would cover whole page usually picturing saints. Then block books appeared: the text was carved next to the picture to add the story to the images. But the pictures dominated with text only taking a small part of a page.
The next step came with the invention of movable type printing press by Johannes Gutenberg around year 1440. Since then ratio of text to pictures was reversed.
Now, with digitization of Venetian archives we have a huge number of books that contain woodblock ornaments. And since every ornament was handcrafted and unique, by identifying those and comparing them we can track the publisher of a book, it’s geographical origin and place it in time.
One peculiar detail is that dishonest publishers would try to copy the ornaments of another publisher. And the number of fake editions is estimated to be almost as big as the number of official ones!
Objectives and deliverables
Our goal as a part of “The Venice Atlas” project would be extracting the ornaments from scanned documents from Venetian archive and classifying them, so users can further make a query with a picture of any ornament and get the books that have this ornament and the publisher. The workflow would be a 3 stage process:
- identifying ornaments in the books
- classifying “look-alike” ornaments together, identify ornaments that belong to one publisher
- separating each group into “counterfeit” and “original” subgroups
In the end we are planning to create a GUI which will allow user to upload an image as an input to the pipeline process mentioned above, and get all the info from the classification of extracted ornament.
- 09.03.2014: finish the ornament identification algorithm and get a database of ornament images
- 06.04.2014: design the first classifier to obtain clusters of “look-alike” ornaments, the ornaments belonging to one publisher
- 04.05.2014: design the second classifier to separate counterfeit ornaments from original ones
- 25.05.2014: create a GUI for ornament query
To use ornaments as a unique signature of a given venetian printer, we first need to address the issue of indetifying them into the body text. The digitization, enhancement and normalization of the images have already been successfully carried out by the researchers (we have approx. 1 million ornaments). So, we can proceed to design a classifier aiming to split the output data in two categories : ornaments and words (for the sake of simplicity we shall assume that they are the only two types of objects that can be found in manuscripts).
One approach to perform such a classification could be to use a machine learning algorithm. The idea is to train the software to distinguish ornaments from words on a trainig set of pre-identified ornaments and words, and then generalize this classification to unknown objects. More specifically, each objects will be characterized by a large number of wisely selected features which shall then be used to design a choice model allowing statistical inference to perform the classification of any given sample of digitized objects from manuscripts.
Identification of such features will be a key aspect of this approach, as poorly chosen features might lead to misleading classifications.
Once this pre-classification performed and the ornaments isolated, we want to be able to address both of these problems :
- Associate a set of ornaments with a Venetian printer,
- Detect ornaments faking the style of another printer.
To answer these questions, we need to design two independent procedures. In fact, in order to answer the first question, one may want to identify shared attributes of a set of ornaments of a given venetian printer, and interpret those shared attributes as the style of the latter. To this end, different approaches might be investigated : machine learning techniques, or multilinear statistics and clustering techniques. Some techniques of classification based on comparaison of the rate of compression of images already exist, and could be used as a first very naive filter. To obatin greater accuracy in the procedure, additional features (as those chosen in the procedure of extraction of the ornaments) might be used to describe in more details the ornaments. Those features coul be used both to train the machine learning algorithm or alternatively as a vectorial representation of ornaments. Then, principal components analysis could be used to select directions of maximum spread of the data, helping in the process of clustering. Classifications obtain by both processes would then be compared to understand their strengths and weaknesses.
However, such a procedure will be very likely to be duped by ornaments faking the style of another printer, as only a few details might change between the two images.
Thus, we need an additional procedure able to detect such a counterfeit. To this end, one could exploit historical details from the block printing process described before, and try to detect blocks in ornaments using OpenSURF feature extraction.
This would provide a natural decomposition of the images. Then, two images seemingly very similar could be decomposed in blocks and analyzed with more precision.
For example, one could use tools from shape theory to efficiently compare shapes in two different blocks. The first step is to extract the shape of the object by means of NURBS and then use statistical tools to compare shapes one another.
NURBS allow a very convenient representation of curves surfaces and solids with only a few control points. Then, the shape is indentified with these control points, and therefore lies in an Euclidean vectorial space with dimension depending on the number of control points. As shape is invariant under rotation, translation and rescaling, we take the quotient of our Euclidean space under the affine transformations, so our shape representation is coherent with the intuitive notion of a shape. Then, it can be shown that the obtained space of shapes is a Riemannian manifold, on which one can define a natural distance allowing to extend classical satistical tools and perform efficient comparaison of shapes and understand variations in a sample of shapes. This will help to determine if two shapes a significally different or not, and then detect counterfeit.
- Etienne Baudrier , Sébastien Busson, Silvio Corsini, Mathieu Delalandre, Jérôme Landre, and Frédéric Morain-Nicolier. “Retrieval of the Ornaments from the Hand-Press Period: An Overview.” 496–500. Washington, DC, USA: IEEE Computer Society, 2009. doi:10.1109/ICDAR.2009.211.
- Woodblock Prints, http://www.artelino.com/articles/woodblock-prints.asp