This is the final project for the course, "Math Methods for Physicists" in National Tsing Hua University. The goal of this project is to study the entropy of encrypted data.
In followings, I will introduce some basic concepts used in this project.
Details about important properties in information theory
Information
Information is a measure of the uncertainty of an outcome. It is related to the amount of data that is required to specify the outcome of an event. The more uncertain an outcome is, the more information is required to resolve uncertainty of the outcome.
The information is calculated using the Shannon information. Shannon information is defined as:
where
Entropy
Entropy is a fundamental concept in information theory that quantifies the uncertainty or randomness associated with a random variable. It measures the average amount of information required to describe or encode an event or a set of outcomes.
The entropy is calculated using the Shannon entropy. Shannon entropy is defined as:
where
Mutual Information
Mutual information is a fundamental concept in information theory and statistics that measures the amount of information that two random variables share. It provides a quantitative measure of the dependence or association between the variables, revealing how much knowing the value of one variable can reduce uncertainty about the other.
The mutual information is calculated using the Kullback-Leibler divergence. Kullback-Leibler divergence is defined as:
where
And the mutual information is defined as:
where
Conditional Entropy
Conditional entropy is a measure of the amount of information needed to describe the outcome of a random variable
The conditional entropy is defined as:
where
There is a relation between entropy, mutual information, and conditional entropy:
There are three elements in encryption: plaintext, ciphertext, and key. Plaintext is the original message. Ciphertext is the encrypted message. Key is the secret used to encrypt the plaintext. The encryption process is a function that maps the plaintext and key to the ciphertext. The decryption process is a function that maps the ciphertext and key to the plaintext.
Firstly, we consider the single-bit encryption. To make sure that once we know the ciphertext and the key, we can recover the plaintext, the encryption function must be a bijection. The possible encryption functions are:
Before we take account into multi-bits encryption, there are two important properties of the operation of a secure cipher identified by Shannon: confusion and diffusion. Confusion means that the relationship between the ciphertext and the key must be complex and involved. Diffusion means that the statistical structure of the plaintext must be dissipated into long-range statistics of the ciphertext.
There are two common ways to achieve confusion and diffusion which are Substitution-Permutation Network (SP-Network) and Feistel Cipher.
Details about SP-Network and Feistel Cipher
- SP-Network contains two main features: substitution and permutation. Substitution means that the plaintext is replaced by the ciphertext using a substitution table. Permutation means that the order of the bits in the ciphertext is changed. Finally, at each round, the round key is combined using XOR encryption and sends the results to next round. The SP-Network is illustrated as follows:
A sketch of a SP-Network with 3 rounds. Cited from wiki.
- Feistel Cipher use another way to realize confusion and diffusion. It splits the plaintext into two halves, and each round, the left half is XORed with the round key and then the result is sent to the round function. The output of the round function is XORed with the right half and then the result is sent to the next round. The Feistel Cipher is illustrated as follows:
A sketch of a Feistel Cipher. Cited from wiki.
Based on these concepts, we have many kinds of algorithm to encrypt the data. We have an introduction about the most common encryption algorithm which is written with the help of GPT: ClickMe!
All we have discussed above is that the encryption is about bit or binary system. Now, if we have a basis set of the data, e.g.:
including:
- Simple Substitution Cipher
- Transposition
- Vigenère cipher
- N-grams Substitution Cipher
- Single Mixed Alphabet Vigenère Cipher: A simple substitution followed by a Vigenère cipher.
- Hill Cipher
- Playfair Cipher
- Multiple Mixed Alphabet Substitution: In this cipher there are a set of d simple substitutions which are used in sequence.
- Autokey Cipher
- Fractional Ciphers
- Codes: In codes words (or sometimes syllables) are replaced by substitute letter groups. Sometimes a cipher of one kind or another is applied to the result.
To make it clear, we can construct a function mapping the plaintext with the key to ciphertext as shown in the following form:
which
-
$f$ is a bijection. - Ciphertext must correspond to every element in the basis set. That without the key, we cannot recover the plaintext.
Given a 3-elements basis set
And a general form of encryption function can be represented as:
where
where we can see that
To realized the perfect secrecy in this way, we need to have a one-time pad (OTP), which has been proved mathematically that it can achieve perfect secrecy and quantum resistance. However, it still have restrictions in the practical application, such as pre-shared key, key length, and key distribution (P.S. Quantum Key Distribution (QKD) may be helpful in OTP.)
Nowadays, what we usually use in daily communication is the asymmetric encryption (as known as Public-key cryptography), such like RSA, Elliptic Curve Cryptography (ECC) and so on. In this case, we have two keys, one is public key, and the other is private key. The public key is used to encrypt the data, and the private key is used to decrypt the data. The public key is usually published to the public, and the private key is only known by the owner.
The encryption function and decryption function now will no longer be a linear function as symmetric one, but a nonlinear function. The encryption function is usually a trapdoor function, which is easy to compute in one direction, but hard to compute in the opposite direction without the private key. Therefore, we can use asymmetric encryption to exchange the symmetric key, and then use the symmetric encryption to encrypt the data.
However, it has been proved that RSA is not quantum resistant, which means that the quantum computer can break the RSA in polynomial time with Shor's algorithm. And now, we have Post-quantum cryptography (PQC) to resist the attack from both classical and quantum computer.
We constructed a 4-bits XOR cipher as an example to show how to calculate the entropy of the encrypted data, mutual information and conditional entropy to show the relation of plaintext, key and ciphertext.
In addition, to show the fractal structure of XOR encryption, we also constructed 10-bits and 8-bits XOR cipher.
The details of the methods can be found in the entropy_of_encrypted.ipynb.
Note: the base of logarithm we used is 2 which represents the unit of entropy is bit.
To start with the 4-bits cipher, the following figure shows the ciphertext as the result of XOR operation between plaintext and key. We can see that the ciphertext is equally distributed to every element in the basis set. And the result is like Sudoku that the number of ciphertext is not showing again on its row and column.
And to calculate the properties in information theory later, we firstly made a bar plot of the entropy in 4-bits basis set to have more clear view. In the figure, we can see that when the numbers of 0 and 1 are equal, the entropy reaches the maximum. And the entropy is at the minimum when all the bits are 0 or 1.
The entropy of the ciphertext is shown in the following figure. It turns into a four-fold symmetry instead of a fractal structure.
Then the mutual information between C, P and C, K are shown in the followings. We can see that they are a quarter turn to each other. And also, the mutual information between C, P and C, K are small in most of the part, which means that the corelation between the ciphertext and the plaintext or key is weak.
With Eqn. (1), we can calculate the conditional entropy in the 4 kinds of relation. Let's see when the plaintext or key is known and how the conditional entropy that ciphertexts have. Based on the mutual information we get, there are cross-like pattern on it and have a quarter turn to each other.
And when the ciphertext is known, the conditional entropy of plaintext and key are shown in the followings. We can see that the conditional entropy of plaintext and key are the same, which means that the plaintext and the key are as well as hard to recover.
And we can compare the average entropy of 4-bits cipher and 8-bits cipher. We assume that the probability of each states (bits combination) is the same, and we know that the entropy of binary distribution is in the interval of 0 and 1 bit. The result is shown in below table.
Bits |
|
||
---|---|---|---|
4 | 0.7806 | 0.7806 | 0.5434 |
8 | 0.9024 | 0.9024 | 0.7788 |
We found that for equidistributed keys, the average entropy of ciphertexts is same as the average entropy of basis set. However, once we know the ciphertext, the average entropy (conditional entropy) of plaintexts and keys will be lowered down.
That sounds weird, since we encrypt the data, it should be more random and the entropy should be increased. But the result shows that the entropy of plaintext and key are decreased. The reason is that the plaintext should NOT be equidistributed! The plaintext we want to send must be meaningful and full of "information".
On the other hand, what we discussed here is (multi-bits) single-char-encryption. In real case, we would send a sequence of chars instead of a single char. The average entropy we calculated here only represents the case that if we send a random plaintext with a random key in all the same probability.
Therefore, if we implement the randomly distributed key on the "full-of-info" plaintext, it will make the ciphertext random instead keeping the regularity of plaintext, which results in the increase of the entropy of ciphertext from plaintext.
Based on the results, we can conclude that the properties of a good encryption are as follows:
- The transformation map from plaintext and key to ciphertext must be equidistributed.
Since if the transformation map is not equidistributed, based on char frequency in each languages statistically, the eavesdropper can guess the plaintext from the ciphertext with frequency characteristic. - The key must be randomly distributed.
Since we want to make the conditional entropy between ciphertext and plaintext reach the maximum, and the mutual information between ciphertext and plaintext reach the minimum. As a result, the eavesdropper cannot get useful information from ciphertext without key. - The order of the sequence of chars from plaintext to ciphertext must have a random permutation, and we also have to make the length of ciphertext into a given length instead of original length.
To satisfy confusion and diffusion, it can make the mutual information between ciphertext and plaintext lower, and the conditional entropy between ciphertext and plaintext higher. As a result, it will make the eavesdropper even harder to get useful information in the case that we use the same key and communicate multiple times.
- C. E. Shannon, "Communication theory of secrecy systems," in The Bell System Technical Journal, vol. 28, no. 4, pp. 656-715, Oct. 1949, doi: 10.1002/j.1538-7305.1949.tb00928.x.
- C. E. Shannon, "A mathematical theory of communication," in The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423, July 1948, doi: 10.1002/j.1538-7305.1948.tb01338.x.
This project is licensed under the MIT License - see the LICENSE file for details.
@misc{Entropy2023,
author = "Jiun-Cheng Jiang",
title = "Entropy of Encrypted Data",
howpublished = "\url{https://github.com/Jim137/Entropy}",
year = "2023",
month = "May"
}