Skip to content

This program cracks a substitution cipher using Metropolis-Hastings

Notifications You must be signed in to change notification settings

timurka43/cipher_cracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GitHub contributors GitHub stars GitHub forks

cipher_cracker

This program cracks a substitution cipher using the Metropolis-Hastings Algorithm - one of Markov Chain Monte Carlo (MCMC) methods.

Algorithm Idea:

The general idea of how the metropolis-hastings algorithm works in this setting is as follows: The algorithm repeatedly generates a reverse substitution cipher, deciphers the input text and evalutes to what extent the new deciphered text appears to be in proper English. When the deciphered attempt is more likely to come from the proper Englihs than the previous attempt, it is accepted, thus getting us closer to the proper english - original text. If the new attempt is worse than the previous attempt (lessl likely to be proper English), then that attempt is usually rejected, but still can be accepted with a small probability specified by how likely the deciphered text is in proper English. We can think of all the possibilities for the decipher being mapped into some n-dimensional space, where n is the number of letters in the alphabet, and the algorithm attempting to find the absolute maximum of the mapping, where the maximum represents the decipher that generates the original text, where the original text has the highest probablity of being written in proper English.

Defining "Proper English" with Tolstoy's War and Peace

A common way to quantify whether a string of text comes from a natural English language is to calculate the proportion of all possible character transitions in the text. (e.g the percent of transions from t to h will be relatively high as words like this, that, the are very common in english. On the other hand, transitions from k to t are very rare if at all present in English (I can't think of one). While it may appear counter-intuitive to use the translation of a Russian text to model English language's character transitions, it somehow has become a common practice to use War and Peace for this purpose. The vast volume of the text lets us use a single file, while also providing a large enough sample size of characters to model our population distribution of character transitions in English.

Potential improvements and optimizations:

Currently the program deciphers the text to the point where oftentimes where a small number of letters are still switched (e.g. all w's are replaced with t's and vice versa). While it usually works well enough so the user can understand the original message, it can be modified to almost always produce the exact original text with no substitions left. To implement this improvement, one simply needs to add multiple threads to this algorithm and then find the best result across all threads. The idea behind this is simple: the quality of "proper English" generated by a random reverse cipher can be mapped by some polynomial function with multiple local maxima and minima. As any one thread starts randomly anywhere on the function and is more likely to move in the direction of the most local critical point, any one thread will likely record that local maximum/minimum as its result, which may be good enough to understand the initial message but still contain mistakes. However, if we generate multiple threads, their starting point are will be spread out across the function's domain such that the threads will likely find all critical points. Thus, by comparing the results of multiple threads, we can almost always guarantee that we found a global minimum/maximum.

About

This program cracks a substitution cipher using Metropolis-Hastings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages