Technical Report UT-CS-07-593
Department of Computer Science
University of Tennessee
The online home for this document is:
http://www.cs.utk.edu/~plank/plank/papers/CS-07-593
Please send me email (plank@cs.utk.edu) and let me know. One of the ways in which I am evaluated is the impact of my work, and hard data on how many people use this code is important.
This web site is based upon work supported by the National Science Foundation under Grants CNS-0615221 and CNS-0437508. I would also like to thank Cheng Huang, Lihao Xu, Jin Li and Mochan Shrestha for helpful discussions.
Galois Field arithmetic is fundamental to many applications, especially Reed-Solomon coding. With a Galois Field GF(2w), addition, subtraction, multiplication and division operations are defined over the numbers 0, 1, ..., 2w-1. in such a way that:
- They are closed -- if a and b are elements of the field,
then so are (a+b), (a-b), (a*b) and (a/b).
- They adhere to all the well-known properties of addition/subtraction/multiplication/division.
For example, addition and multiplication are commutative; addition, multiplication and
division are all associative; addition/multiplication are distributive; etc.
- Every element has a multiplicative inverse. This is the most important property. It means that if a is an element of the field and a ≠ 0, then there exists an element b that is also an element of the field such that ab = 1.
When w = 8, the Galois Field GF(28) comprises the elements 0, 1, ..., 255. This is an important field because it allows you to perform arithmetic that adheres to the above properties on single bytes. Again, this is essential in Reed-Solomon coding and other kinds of coding. Similarly, w=16 and w=32 are other important fields.
There are useful applications for fields with other values of w. For example, Cauchy Reed-Solomon coding employs Galois Fields with other values of w, and converts them to strictly binary (XOR) codes.
Galois Fields are covered in standard texts on Error Correcting Codes such as Peterson & Weldon [PW72], MacWilliams and Sloane [MS77], and Van Lint [VL82]. These treatments are thorough, and take a bit of time to understand. For a briefer and more pragmatic treatment, see Plank's 1997 tutorial on Reed-Solomon coding [P97], [PD05].
This library comes as a tar file: galois.tar. The files that compose galois.tar are:
- README.html - This file.
- galois.h - The header file for the library.
- galois.c - The implementation of the library.
- gf_mult.c - A command line Galois Field multiplier.
- gf_div.c - A command line Galois Field divided.
- gf_xor.c - A command line XOR (Galois field addition/subtraction).
- gf_inverse.c - A command line Galois Field inverter. This is the same as dividing one by a number.
- gf_log.c - Discrete logarithm for a Galois Field.
- gf_ilog.c - Discrete inverse logarithm for a Galois Field.
- gf_basic_tester.c - A correctness and timing tester for Galois Field arithmetic.
- gf_xor_tester.c - A timing tester for XOR.
- makefile - The makefile.
- primitive-polynomial-table.txt - A table of primitive polynomials.
Type make to compile the library and the command line tools. Testing the command line tools is a nice way to see how things work with Galois Fields. First, addition and subtraction in a Galois Field are equivalent -- they are equal to the bitwise exclusive-or operation. The program gf_xor allows you to take the bitwise exclusive-or of two numbers:
Unix-Prompt> gf_xor 15 7 8 Unix-Prompt> gf_xor 8 7 15 Unix-Prompt> gf_xor 230498 2738947 2772833 Unix-Prompt> gf_xor 2772833 230498 2738947 Unix-Prompt> |
The program gf_mult takes three arguments: a, b and w, and prints out the product of a and b in GF(2w). The program gf_div performs division, and the program gf_inverse returns the multiplicative inverse of a number in GF(2w):
Unix-Prompt> gf_mult 3 7 4 9 Unix-Prompt> gf_div 9 3 4 7 Unix-Prompt> gf_div 9 7 4 3 Unix-Prompt> gf_mult 1234567 2345678 32 1404360778 Unix-Prompt> gf_div 1404360778 1234567 32 2345678 Unix-Prompt> gf_div 1404360778 2345678 32 1234567 Unix-Prompt> gf_inverse 1404360778 32 106460795 Unix-Prompt> gf_mult 1404360778 106460795 32 1 Unix-Prompt> |
The other command line tools are discussed at the end of this document.
The files galois.h and galois.c implement a library of procedures for Galois Field Arithmetic in GF(2w) for w between 1 and 32. The library is written in C, but will work in C++ as well. It is especially tailored for w equal to 8, 16 and 32, but it is also applicable for any other value of w. For the smaller values of w (where multiplication or logarithm tables fit into memory), these procedures should be very fast.
In the following sections, we describe the procedures implemented by the library.
The syntax of these two calls is:
|
Galois_single_multiply() returns the product of a and b in GF(2w), and galois_single_divide() returns the qoutient of a and b in GF(2w). w may have any value from 1 to 32.
The decision to make this procedure use regular, signed integers instead of unsigned integers was largely for convenience. It only makes a difference when w equals 32, in which case the sign bit of a, b, or the return values may be set. If it matters, simply convert the integers to unsigned integers. The procedures in this library will work regardless -- when w equals 32, they are treated as streams of bits and not integers.
It is anticipated that most applications that need to perform single multiplications and divisions only need reasonable performance, which is what these two procedures give you. If you need faster multiplication and division, then see the procedures below, which allow you to get much faster performance.
A common use of Galois Field arithmetic is multplying a region of bytes by a single number. This is the basic operation of Reed-Solomon encoding and decoding. This library provides the following three procedures for performing region multiplication:
|
These multiply the region of bytes specified by region and nbytes by the
number multby in the field specified by the procedure's name.
Region should be long-word aligned, otherwise these routines will generate a bus
error. There are three
separate functionalities of these procedures denoted by the values of r2 and add.
- If r2 is NULL, the bytes in region are replaced by their products with multby.
- If r2 is not NULL and add is zero, then the products are placed in the nbytes of memory starting with r2. The two regions should not overlap unless r2 is less than region.
- If r2 is not NULL and add is one, then the products are calculated and then XOR'd into existing bytes of r2.
The performance of these procedures has been tuned to be very fast. A multiplication table is employed when w=8. Log and inverse log tables are employed with w=16, and seven multiplication tables are employed when w=32.
The following procedure allows you to perform the bitwise exclusive-or of a region of bytes.
|
R3 may equal either r1 or r2 if you wish to overwrite either of them. Again, all pointers should be long-word aligned.
While galois_single_multiply() and galois_single_divide() are nice general-purpose tools, their generality makes them slower than they need to be. For high-performance, you may want to employ their underlying implementations, described below:
When w is small, the fastest way to perform multiplication and division is to employ multiplication and division tables. These tables consume 2(2w+2) bytes each, so they are only applicable when w is reasonably small. For example, when w=8, this is 256 KB per table.
To use multiplication and division tables directly, use one or more of the following routines:
|
Galois_create_mult_tables(w) creates multiplication and division tables for a given value of w and stores them internally. If you call it twice with the same value of w, it will not create new tables the second time. You may call it with different values of w and the tables will be stored separately.
If successful, galois_create_mult_tables() will return 0. Otherwise, it will return -1, and any allocated memory will be freed.
Galois_multtable_multiply() and galois_multtable_divide() work just like galois_single_multiply() and galois_single_divide(), except they assume that you have called galois_create_mult_tables() for the appropriate value of w and that it was successful. They do not error-check, so if you have not called galois_create_mult_tables(), they will seg-fault. This decision was made for speed -- although for small values of w (between 1 and 9), galois_single_multiply() uses galois_multtable_multiply(), it is significantly slower because of the error checking that it does.
Finally, to free yourself of procedure call overhead, the routines galois_get_mult_table() and galois_get_div_table() return the tables themselves. The product/quotient of a and b is in element a*2w+b, which of course may be computed quickly via bit arithmetic as ( (a << w) | b). You do not need to call galois_create_mult_tables() before calling galois_get_mult_table() or galois_get_div_table().
When multiplication tables cannot be employed, the next fastest way to perform multiplication and division is to use log and inverse log tables, as described in [P97]. The log table consumes 2(w+2) bytes and the inverse log table consumes 3*2(w+2) bytes, which means that middling values of w may be handled. For example, when w=16, this is 1 MB of tables.
To use the log tables, use one or more of the following routines:
|
Galois_create_log_tables(w) creates log and inverse log tables for the given value of w, and stores them internally. If you call it twice with the same value of w, it will not create new tables the second time. You may call it with different values of w and the tables will be stored separately.
If successful, galois_create_log_tables() will return 0. Otherwise, it will return -1, and any allocated memory will be freed.
Galois_logtable_multiply() and galois_logtable_divide() work just like galois_single_multiply() and galois_single_divide(), except they assume that you have called galois_create_log_tables() for the appropriate value of w and that it was successful. They do not error-check, so if you have not called galois_create_log_tables(), they will seg-fault. This decision was made for speed -- although for medium values of w (between 10 and 22), galois_single_multiply() uses galois_logtable_multiply(), it is significantly slower because of the error checking that it does.
Galois_log() and galois_ilog() return the log and inverse log of an element of GF(2w). You can use them to multiply using the following identity:
a * b = ilog[ (log[a] + log[b]) % ((1 << w)-1) ]
a / b = ilog[ (log[a] - log[b] + (1 << w)) % ((1 << w)-1) ]
The division identity takes into account C's weird definition of modular arithmetic with negative numbers.
To perform the fastest multiplication and division with these tables, you should get access to the tables themselves using galois_get_log_table(int w) and galois_get_ilog_table(int w). Then you may calculate the product/quotient of a and b as:
a * b = ilog [ log[a] + log[b] ]
a / b = ilog [ log[a] - log[b] ]
You do not have to worry about modular arithmetic because the ilog table contains three copies of the inverse logs, and is defined for indices between -2w+1 and 22w-2. This saves a few instructions.
When tables are unusable, general-purpose multiplication and division is implemented with the following two procedures:
|
Galois_shift_multiply() converts b into a w * w bit matrix and multiplies it by the bit vector a to create the product vector. You may see a quasi-tutorial description of this technique in the paper [P05]. It is significantly slower than the methods that use tables. However, it is general-purpose and requires no preallocation of memory.
For division, Galois_shift_divide() also converts b into a bit matrix, inverts it, and then multiplies the inverse by a. As such, it is *really* slow. If I get a clue how to implement this one faster, I will.
Galois_single_multiply() uses galois_shift_multiply() for w between 23 and 31. Galois_single_divide() uses galois_shift_divide() for w between 23 and 32.
Finally, for w = 32 the following procedures are defined:
|
Galois_create_split_w8_tables() creates seven tables that are 256 MB each. Galois_split_w8_multiply() employs these tables to multiply the 32-bit numbers by breaking them into four eight-bit numbers each, and then performing sixteen multiplications and exclusive-ors to calculate the product. It's a cool technique suggested to me by Cheng Huang of Microsoft, and is a good 16 times faster than using galois_shift_multiply().
"But couldn't you use this technique for other values of w?" Yes, you could, but I'm not implementing it, because I don't think it's that important. If that view changes, I'll fix it.
Galois_single_multiply() uses galois_split_w8_multiply() for w = 32.
The only possible race conditions in these codes are when the various tables are created. For that reason, galois_create_mult_tables() and galois_create_log_tables() should be protected by a mutex if thread safety is a concern.
Since galois_single_multiply() and galois_single_divide() call the table creation routines whenever the tables do not exist, if you are worried about thread safety, then for each value of w that you will use, you should make sure that the first call to galois_single_multiply() or galois_single_divide() is protected. After that, no protection is required.
The programs gf_mult, gf_div, gf_log, gf_ilog gf_inverse and gf_xor are straightforward and allow you to test the various routines for various values of w.
Gf_basic_tester and gf_xor_tester test both correctness and speed. Call gf_xor_tester with no arguments to test the speed of gf_region_xor() on your system. Here it is on a Macbook whose CPU is a little busy doing other things:
Unix-Prompt> gf_xor_tester XOR Tester Seeding random number generator with 1172533188 Passed correctness test -- doing 10-second timing 1827.79986 Megabytes of XORs per second Unix-Prompt>
Gf_basic_tester takes the following command line arguments:
- W: 1 through 32
- Method: This is one of the following words: default, multtable, logtable, shift, splitw8. It specifies how multiplication/division will be performed. Default uses gf_single_multiply() and gf_single_divide().
- Ntrials: This specifies how many random multiplies/divides to test for correctness.
After testing for correctness, gf_basic_tester tests for speed. There are three special cases. When method=default and w is 8, 16 and 32, gf_basic_tester also tests the speed of gf_wxx_region_multiply().
For example (again, on my MacBook):
Unix-Prompt> gf_basic_tester 16 default 100000 W: 16 Method: default Seeding random number generator with 1172533569 Doing 100000 trials for single-operation correctness. Passed Single-Operations Correctness Tests.Doing galois_w16_region_multiply correctness test. Passed galois_w16_region_multiply correctness test.
Speed Test #1: 10 Seconds of Multiply operations Speed Test #1: 42.23862 Mega Multiplies per second Speed Test #2: 10 Seconds of Divide operations Speed Test #2: 43.42448 Mega Divides per second
Doing 10 seconds worth of region_multiplies - Three tests: Test 0: Overwrite initial region Test 1: Products to new region Test 2: XOR products into second region
Test 0: 253.45548 Megabytes of Multiplies per second Test 1: 238.59569 Megabytes of Multiplies per second Test 2: 167.99968 Megabytes of Multiplies per second
- [MS77]
F. J. MacWilliams and N. J. A. Sloane,
The Theory of Error-Correcting Codes, Part I,
North-Holland Publishing Company,
Amsterdam, New York, Oxford,
1977.
- [PW72]
W. W. Peterson and E. J. Weldon, Jr.,
Error-Correcting Codes, Second Edition,
The MIT Press,
Cambridge, Massachusetts,
1972, ISBN: 0-262-16-039-0.
- [P97]
J. S. Plank,
"A Tutorial on Reed-Solomon Coding for
Fault-Tolerance in RAID-like Systems,"
Software -- Practice & Experience,
27(9), September, 1997, pp. 995-1012.
http://www.cs.utk.edu/~plank/plank/papers/SPE-9-97.html.
- [P05]
J. S. Plank,
"Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications",
Technical Report CS-TR-05-569, University of Tennessee, November, 2005.
http://www.cs.utk.edu/~plank/plank/papers/CS-05-569.html.
- [PD05]
J. S. Plank and Y. Ding,
"Note: Correction to the 1997 Tutorial on Reed-Solomon Coding",
Software, Practice & Experience, Volume 35, Issue 2, February, 2005, pp. 189-194.
http://www.cs.utk.edu/~plank/plank/papers/SPE-04.html.
- [VL82] J. H. van Lint, Introduction to Coding Theory, Springer-Verlag, New York, 1982.