Friday, 11 July 2014

Network security - Cryptography - Substitution cipher

Substitution cipher:

The first encrypted messages were developed in ancient Egypt as series of disordered hieroglyphics. This means of encryption was very simple, utilizing a method called simple substitution. The original message, or plaintext, was encoded using a substitution cipher. Each letter (or picture) of the plaintext was simply replaced by another letter of the alphabet, resulting in the encoded message, or cipher text.
For example:
The message or the plain text is ― ATTACK ―
Could be encrypted as, ―BUUBDL‖
In this example, each letter of the plaintext was simply replaced with the next letter in the alphabet. That is the key used is +1. Actually, this example is a special form of substitution cipher known as a Caesar Cipher, attributed to Julius Caesar.

An alphabet is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A, B, C,..., Z}. is an ordered set of symbols. A simple substitution is one in which each letter of the plaintext is always replaced by the same cipher text symbol. In other words, there is a 1-1 relationship between the letters of the plaintext and the cipher text alphabets.

For the normal English alphabet, how many different cipher text alphabets can we get if we use the same letters? In other words, in how many different ways can we permute or rearrange the English alphabet? The answer is 26!. That's approximately equal to the 403291461126605635584000000. To understand how we got that number imagine that you are given the task of making an arbitrary permutation of the English alphabet. You have to make 26 choices.
On the first choice you can choose any one of the 26 letters in the alphabet. On the second choice you can choose any one of the remaining 25 letters. On the third choice you can choose any one of the remaining 24 letters. And so on. On the last choice, there is just one letter remaining. So, in all there are 26! = 26 x 25 x 24 x ... x 1 different ways to make these choices.

Although there are 26! Possible cipher text alphabets, any fan of puzzle books or newspaper cryptograms knows that simple substitution ciphers are relatively easy to break by hand by analyzing letter frequencies and guessing at common words.

The nine most frequent letters in English are E, T, N, A, O, R, I, S, and H. The five letters that occur least often are J, K, Q, X, and Z. Generally, we would need a letter of considerable length in order to make very good use of our knowledge of letter frequencies.

The most common two letter combinations or digrams are: th, in, er, re, and an etc.
The most common three letter combinations or trigrams are : the, ing, and, and ion.

For example, consider the following cipher text message from an account firm: It is arranged into group of five:
CTBMN BYCTC BTJDS QXBNS GSTJC BTSWX CTQTZ CQVUJ
QJSGS TJQZZ MNQJS VLNSX VSZJU JDSTS JQUUS JUBXJ
DSKSU JSNTK BGAQJ ZBGYQ TLCTZ BNYBN QJSW

A likely word will be financial in an accounting firm:
The financial word has repeated letter (i ), with four other letters between their occurrences. We look for repeated letters in the cipher text with four letter spacing.
We get at positions: 6, 15, 27, 31, 42, 48, 56, 66, 70, 71, 76, and 82.
The next letter to i is n which is also repeated with one letter between them.

Only two out of these 31 and 42 have the repeated in proper place. And now we have only 31 has the letter a correctly positioned. Thus we know financial begins at position 30.
Substitution preserves the order of the plain text but disguise them. Thus deducing key is easy by using the frequency statistics for English text or may be any language.

No comments:

Post a Comment