-
Notifications
You must be signed in to change notification settings - Fork 1
rlucioni/Fountain-Codes
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
README CS 51 FINAL PROJECT: FOUNTAIN CODES This is an OCaml implementation of a Luby transform (LT) code, the first class of practical fountain codes that are near optimal erasure correcting codes. The LT code was invented by Michael Luby in 1998 and published in 2002. The distinguishing characteristic of LT codes is in employing an encoding algorithm based on the XOR (exclusive or) operation to encode and decode a message. LT codes are called "rateless" because the encoding algorithm can in principle produce an infinite number of message packets, which for the purposes of this project have been named "droplets." ******************************************************************************** FILE MANIFEST droplet.ml - defines the Droplet, containing a seed, the XOR'd data, and the total number of pieces fountain.ml - defines the Fountain, produces Droplets according to the fountain code implementation chosen goblet.ml - defines a Goblet, used to collect Droplets and reconstruct the original data. This takes a droplet (seed, the data to be decoded, and the total number of pieces) operations.ml - allows encoding and decoding of an arbitrary file (message); the file is passed as an argument along with an output destination (like test_dump), a piece size, and a maximum number of pieces; compile to run from the command line probability.ml - takes a filename as a command line argument, then initializes and uses a fountain and goblet with uniform, poisson, or normal distribution (see distribution.ml below) then takes additional arguments based on the probability distribution selected; also allows running multiple times to generate multiple samples of transmission; compile to run from the command line distribution.ml- contains implementations of fountain and goblet, with the rand_droplet_pieces and get_num_chunks methods overridden with new PRNG, which generate values according to either the poisson or normal distributions; the parameters to these distributions are given as class arguments; when droplets are generated, the number of pieces XOR'd into each piece varies based on the distribution and its parameters md5test.sh - used for testing; compiles operations, creates a test file with 1000 bytes pulled from /dev/random, runs that file through operations, places the output in a file named test_dump, generates and compares MD5 hashes of the input and output files, and informs you of the result; test is passed if MD5 hashes match, failed if they do not match NOTE: If you are in the appliance, you MUST change calls to md5 in this script to calls to md5sum. As it is, md5test.sh will run natively on Mac OS X, but not in the appliance. After making this change, you can ignore warnings about the use of the quiet flag (-q) when you run the script. image_test.sh - same as md5test.sh, but uses the included lenna.jpg image instead of a randomly generated file; the output file (processed image) can be viewed by opening test_dump NOTE: If you are in the appliance, you MUST change calls to md5 in this script to calls to md5sum. As it is, image_test.sh will run natively on Mac OS X, but not in the appliance. After making this change, you can ignore warnings about the use of the quiet flag (-q) when you run the script. test_dump, test_file - files created, used, and/or overwritten by md5test.sh and image_test.sh during testing; test_dump can (and should) also be used as a convenient output destination when running operations lenna.jpg - image for testing file I/O bear.gif - a gif used for testing file I/O on larger, animated files sentence.txt - A text file which contains the sentence "This is a test." Provided for convenience when testing operations. prince_ch1.txt - A text file which contains Chapter 1 of The Prince. Provided for convenience when testing operations. OPERATING INSTRUCTIONS To encode a file, transmit its droplets to a goblet, and decode the file, first compile operations: $ make operations Then, from the command line: $ ./operations filename output_destination piece_size max_pieces For example, to run the sentence contained in sentence.txt through our LT code (i.e., encode, transmit droplets to a Goblet, and then decode), you could type: $ ./operations sentence.txt test_dump 5 4 This will cut up the data contained in sentence.txt into 5 character pieces and allow at most 4 pieces to be XORed together in a droplet. The output (the reconstructed file) will be in test_dump. We have included 2 scripts used for testing which also serve to demonstrate out project in action: md5test.sh and image_test.sh. Both are described above in the File Manifest. These scripts are run from the command line. To run md5test.sh, type: $ ./md5test.sh To run image_test.sh, type: $ ./image_test.sh To run probability, type: $ ./probability filename The executable will then prompt you for additional arguments, including a choice of distribution, the parameters relevant for that distribution, and the number of times to attempt the file decoding. The probability executable does not output the file anywhere; it simply generates a new fountain and goblet a certain number of times, and then prints out the number of droplets that were consumed before the file could be decoded, along with the number of pieces the original file was made of. KNOWN BUGS/INEFFICIENCIES The math currently being used in Goblet is sub-optimal and stands to be improved by allowing the program the recognize when it can XOR immediately, instead of waiting for a droplet containing a single piece of data. We tried but were ultimately unable to implement this feature using matrices and Gaussian elimination. RESOLVED - Operations is known to overflow when used on relatively large images. This is likely the result of failing to use a recursive tail call in Goblet. RESOLVED - io_operations corrupts input data RESOLVED - Duplicate pieces may appear in a single droplet. These duplicate do not provide any additional information. RESOLVED - io_operations compiles but overflows on execution RESOLVED - When you designated a piece size larger than the input message, spaces will be added to your message until it divides evenly by the piece size you have given. This also happens any time the message does not divide cleanly by the piece size given. This is expected behavior, but can be improved.
About
An implementation of a Luby transform (LT) code written in OCaml
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published