This Python script reads a CSV file, hashes a specified column using a selected hashing algorithm, and then writes the updated data to a new CSV file. Optionally, the hash can be truncated to a specified length, and the script checks for hash clashes if truncation is enabled.
- Python 3.x
- Pandas
Install the required Python packages using pip install -r requirements.txt
.
python csv-hasher.py input_path output_path col_to_hash [-a ALGORITHM] [-l LENGTH] [-s SALT]
input_path
: Path to the input CSV file.output_path
: Path where the output CSV file will be saved.col_to_hash
: The name of the column to hash.-a
or--algorithm
: Optional. Hash algorithm to use. Default issha224
.-l
or--length
: Optional. Length to truncate the hash.-s
or--salt
: Optional. Salt to add to the hash.
- Hash column 'email' using default SHA224 algorithm.
python csv-hasher.py input.csv output.csv email
- Hash column 'email' using SHA256 algorithm.
python csv-hasher.py input.csv output.csv email -a sha256
- Hash and truncate the column 'email' to 50 characters.
python csv-hasher.py input.csv output.csv email -l 50
- Use SHA256 and truncate to 50 characters.
python csv-hasher.py input.csv output.csv email -a sha256 -l 50
- Hash column 'email' using SHA256 algorithm and salt 'my_salt'.
python csv-hasher.py input.csv output.csv email -a sha256 -s my_salt
- The script will save the updated CSV file with two new columns: one for the full hash and one for the truncated hash (if truncation length is provided).
- A log file will be generated if hash clashes are found when truncation is used.
- The script will check for hash clashes only if truncation is used. Hash clashes are possible when hashes are truncated.
- blake2b
- blake2s
- md5
- sha1
- sha224
- sha256
- sha384
- sha512
- sha3_224
- sha3_256
- sha3_384
- sha3_512
- shake_128
- shake_256
To find all algorithms that are available, use hashlib.algorithms_available
on your local Python environment.
import hashlib
# Algorithms guaranteed to exist on all platforms
print("Algorithms guaranteed:", hashlib.algorithms_guaranteed)
# Algorithms that are available on the current platform
print("Algorithms available:", hashlib.algorithms_available)