In our previous presentation, we described our plans and milestones for the decryption of venetian’s five century old secret communications. The mains steps were :
- acquisition of metadata
- transcription of the text
- processing the transcribed text with computer performed algorithms
- identify the encryption method and the appropriate decryption method
- build a automatised decryption tool
We are currently in possession of about 70 pages of scans of handwritten letters from one venetian ambassador in Constantinople during 1560’s. Some of the letters are written in plain venetian, other contains ciphered parts, and some are completely ciphered.
We could extract some contextual information i.e. precise dates of the letters, location and author… Historical context can be found in Mediterranean History books. To summarize it briefly, Constantinople was recently conquered by the Turks, which reinforced the position of the Islam in comparison to the Christianity. The West feels threatened by the progress of the Turkish army, which is now aiming for La Goletta (Tunis), Oran (Algeria), both occupied by spanish forces, and Sicile.This means that the turkish conquest will mainly rely on naval battles, which implies that wealthy coastal cities will play a key role in the upcoming battles. Venice was a laic city, and such was having commercial relations both with the Turks and the West.
The dates also help guessing which ciphering techniques could have been used. The time period of the letter leave space for the following suppositions : Trithemius, Alberti or early instances of precursors of the Vigenere cipher.
The documents we were provided with have no further contextual information attached to them.
In the case of Alberti, we are not in possession of any sample Alberti cipher disk or any outside element to help understand better the cryptosystem.
The letters in our possession have already been studied by a french team of historians, but we don’t have access to their results yet.
We will mainly focus on 2 parts for the next few weeks of the project. 1. Digitization of the plaintext and cipher documents. 2. Cryptography(frequency) analysis of the encryption text. Digitization has been an issue since the beginning, we need to sample ancient Venetian language to have a base frequency to start with, Then tune it to fit the context of the encrypted text which we are working on, in this case – letters. Next we need to digitize the ciphertext from the document (remap the symbols used if necessary) to perform cryptanalysis to have a better understanding of the situation.
Concerning the transcription of the data, the size and state of it, namely handwriting shifts (three different handwriting are present in our dataset) and page quality (paper is transparent) prevent us from using any OCR method in a practical way since the datasets are too sparse and various to train a OCR.
There are two types of cipher used in the letters. The first method appears through the form of indexed letters; each one of the letters are associated with another placed on the top left. This is a strong sign of a Trithemius, Alberti or any kind of polyalphabetic cipher. The second one seems to be a monoalphabetic mapping between the standard venetian alphabet and a made up one using esoteric and exotic symbols. Our first attack wave will be made of frequency analysis which could solve both decryption at the time. As the second cryptosystem seems to be a monoalphabetic substitution, the frequency analysis should provide us a straightforward way to break the encryption. In the first cryptosystem things are a bit trickier but one way to tackle the polyalphabetic decryption would be to regroup letters by index and compute the frequencies and match these group with letter frequencies in ancient venetian. If we have enough data per index plain frequency analysis could suffice to break the encryption. If it’s not the case we would have to employ more advanced and convoluted ways to decrypt.
But as said prior our main concern is to get our hands on enough ancient venetian text in order to be able to harvest a reliable set of frequencies. We would also like to be able to transcript the whole dataset in order to be able to tune the frequencies according to our data and get better correlation between frequencies and our dataset.
Concerning deliverables we produced a python script harvesting and computing letter frequencies ready to analyse big chunks of data.
 ^ Fernand Braudel (1995). The Mediterranean and the Mediterranean World in the Age of Philip II. University of California Press. pp. 986–1055. ISBN 978-0-520-20330-3.