Communication security is an essential feature of contemporary exchanges. With the increasing size of data exchange across the world cryptography has become omnipresent now accounting for 4.95% network traffic in western countries (Europe and US). To achieve this prevalence a long timeline of hidden writing advances ranges from pre-antiquity to modern time. More precisely, secure communication matter eventually arose with writing that appeared at 16th BCE. The first occurrence of ciphered communication is situated between 600-500 BCE and consist of a monoalphabetic substitution developed from the Hebrew alphabet. In fact during antiquity when writing was the only distant communication mean, alphabetic substitution ciphers were extensively used. A famous historical cipher is the Caesar cipher meant for military security. However with encryption cryptanalysis inevitably quickly appeared. Muslim mathematicians left the first trace of cryptanalysis methods during the 9th century CE among which methods as potent as frequency analysis. Facing insightful adversaries, correspondents brought encryption as far as polyalphabetic substitution and transposition cipher before modern time. At the end of the 18th century emerged new means of communications reliant on visual signals rather rather than writing. Opening the way to electrical telegraph and contemporary communications, theses so called semaphore telegraphs announced faster communication and prefigured dematerialization later achieved. Still in the pre-computer era, electromechanical cipher machines as the famous Enigma machines increased further the potential cipher’s complexity yet allowing reverse-engineering of the machines that grant insight of the cipher itself. Ultimately, the computer era revolutionized the field as massive computational power enabled fast algorithmic solving as well as brute force technique. From this point encryption complexity has seek to keep ahead of raw computation power.
As developed above, cipher communication was already widely used during middle-age and usually consisted of substitution cipher. During its rich history, Venice has maintained numerous embassies over the world. The sensitive diplomatic communications were ciphered to prevent espionage. Over a period of three centuries, a polyalphabetic substitution cipher, known as the Alberti cipher was usually used. The cipher was performed using an Alberti cipher disk originally described by Leon Battista Alberti in 1467.
In the framework of a digital humanity project, we envision, by a carefully planned and sound methodology, to be able to unveil the hidden secrets of venetian foreign politics and relations and hopefully make a new side of secret history available to the grand public. To this extent, historical ciphered documents from venetian embassies will be provided as working material. We here present the methodology and planned milestones of the project.
Tree representation of deciphering procedure. The description process can be separated in three different cases, as shown in the tree.
- If we have the ciphertext and the plaintext, we can easily reobtain the disk used to encrypt it, and the rotation pattern used for the encryption.
- If we have the ciphertext and the corresponding disk, we can easily decipher it if we find the rotation pattern, which is often indicated in the ciphertext.
- If it isn’t the case, we can still decipher the text we help of raw computational power and information collected.
- If we only have the ciphertext, the task will be more difficult, depending on how the text was encrypted.
The first stage of our project will be data transcription. We already know that our document will be given in an image format. Given on the amount of text we’ll receive we have several choices about the transcription process. In the case of a small set of texts, we can perform the transcription by hand, by first familiarizing with the handwriting, then typing and proof-checking carefully the text. In any other case, we’ll need help from external experts in order to determine the better path to transcription.
The second step, nerve center of our project, will be metadata and information collection. A smooth decryption can indeed be obtained by extensive knowledge about the data itself and all the metadata orbiting around it. In fact, timelines, locations, seals, type of correspondence, historical context, names and job of the correspondents can help an enormous amount for the decryption process. As an example to illustrate this fact we can use our knowledge of commonly used opening or closing formulas in order to decypher the document in a much easier way, as it is information that transpires from the encryption and therefore reveals information about the secrets of this encryption (i.e . key, disk used etc…). Finally, we should also try to find as many cipher disks as possible, for a reason we will explain in the decryption part of our project report.
We also need to familiarize with ancient venetian on different aspect such as grammar, register used for ambassadors communication (formal) as well as some statistical information about the language itself. With a bit of luck, some researcher in Digital Humanities could provided us with this data.
All of this information collection process will require a multi-national and inter-disciplinary cooperation, hopefully provided by the digital humanities’ crowd.
The Alberti Cipher is one of the first polyalphabetic cipher in the world and it was invented by Leon Battista Alberti in the 15th century. The cipher implements polyalphabetic substitution with mixed alphabets and it comes in the form of a device called the Alberti cipher disk. The disk is consisted of two rotatable rings, each ring is divided into 24 cells. The outer ring represents 20 alphabets and 4 numbers from 1 – 4 for plaintext, the inner ring contains a random order of alphabets for ciphertext.
Several methods of encipherment was described in Alberti’s treatise in 1467. Here we introduce two popular methods of encoding and decoding the Alberti Cipher.
- Text preprocessing. The letters which do not appear in the outer ring (only 24 alphabets are shown) will either be removed and/or replaced to add randomness to the ciphertext.
- Initial shift. The inner and outer ring don’t always align at a default position, the inner ring is shifted by a variable length at the beginning.
- Encoding begin, Each letter in the plaintext is now substituted by the corresponding letter in the inner ring.
- Period. After a variable length of period (could be a fixed number of characters or a number of words encoded), the inner ring has to rotate again for a new alignment.
- Period increment size. How much the inner ring is to be shifted after each encoding period.
In the second encryption method, the encoding period and increment size is embedded in the text. For example, as there are four digits in the outer ring, whenever the encryption reaches a digit, the inner ring must rotate by a length of the digit for the encryption of the following characters.
The decryption process will be the final phase of our project and will ultimately and hopefully allow us to produce the plaintext of the ciphers we’ll obtain from the archives.
The first part of this process is to determine which cryptosystem we are going to fight against; according to our contacts and the historical context, we’re very likely to face instances of the Alberti cipher described in another part of this report. But as careful and meticulous cryptographers, we must not ignore Murphy’s law from the begining to the end of our project. Thus, we must consider other methods of encryption that may have been used to produce the ciphertexts. This is why the metadata collection and analysis phase of our project is so important : by extracting the most information from the documents and the underlying sources we can derive a lot of useful information.
For example we could discover the existence of symetric keys, or magic word between ambassies which would suggest instances of the Vigenère Cipher which is a case of alphabet permutations defined by a key, used to encrypt the document. This particular cipher is contemporary to the timespan our documents originate from and was believed to be unbreakable for more than three centuries, so the odds of facing this cipher are not to be ignored. Hopefully, today’s computational power and cryptographic knowledge allows us to fairly easily break the encryption.
Briefly, our attack would be to use known methods like Kasisky’s examination that consists of guessing the key length by observing repeating patterns. Knowing the key length reduces the problem to “vulgar” instances of the Caesar cipher. We can then perform statistical analysis in order to determine the right key and therefore ciphertext. The latter part introduces a new variable to our project : by researching a bit, we discovered that very few or no statistical analysis are available for Venitian. We will therefore have to perform a panel of analytic methods in order to obtain useful and sound statistical data about this language. In this respect, if our data collection phase yields us some plaintext (i.e decoded versions of the ciphers) we could not only retrieve the corresponding Alberti cipher, but also derive exclusive statistical data about these particular kind of document that may help us for following description attempts.
On the case of Aberti ciphered documents, we would first have to recognize if such method was used : This can be achieved by looking at the index of coincidence (which should be the one of a polyalphabetic cipher) and by carefully looking for unused letters. Concerning the decryption itself, we derived some approaches as you could see on the graph provided. Once again, we cannot emphasize more on the data collection part, in fact, retrieving precious document such as plaintext decryptions, or even Alberti’s disks would accelerate our attacks by a great deal. For example, doing brute force attacks on a set of known Alberti disks found during our research seems to be a good strategy to obtain fast and efficient decryption. In fact knowing the encryption scheme (and variant as seen in the part consacred to Alberti’s cipher), and the disk makes the description an automated and simple task, basically consisting of permutations and rotations of the alphabet. Having to derive the disk from only ciphertext can prove rather cumbersome, we could once again use the set of known Alberti disks and do an exhaustive search on them, otherwise we could rely on repetitive patterns such as opening or closing formulas or names to derive a partial disk and then brute force the rest of it. Also, if consecutive documents have been encrypted with the same key (i.e same disk shifts frequency and magnitude) the problem reduces to solve monoalphabetic cipher.
All of this technical approach will be more exhaustively and precisely addressed once we’ll get our hands on the documents. Concerning the algorithmic and computer-science part of our project, we’ll focus on using python which proves really useful and convenient for this type of data treatment and manipulation. Python is also fairly easy to use and comprehend as we are a team of mixed domains.
Brute force attack : When we perform a brute force attack on a cryptosystem, we try and check every possible combinations of keys or password in order to break the encryption, in the alberti case, it means to try every possible wheel, which is not a correct approach as permutations on the alphabet have an order of 26! (88bit key)
- Plaintext : The human readable text obtained before or after the encryption : our goal.
- Ciphertext : It is the plaintext gone through the encryption process therefore it is most likely to be incomprehensible, our goal will be to revert it to plaintext by using or breaking the cryptosystem used to perform the transformation resulting to the ciphertext.
- Cryptosystem : Set of methods and algorithms used to produce encryption and decryption.
- Permutations on the alphabet : rearrangement or mapping of the letters : (a->c) (b->u) (c->b) etc…
In short, the project can be resumed in term of development stages and deliverables as follow.
- Document familiarization and transcription..
- Data collection.
- Decryption process.
- Automatic and sound decryption method and deliverable production.
- Plaintext of the ciphered documents.
- Extensive information orbiting around plaintext.
- Decryption methodology and usables.