Skip to content

luyomo/mask-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSV Data Masking Application

This is a Go application to mask CSV data based on specific masking rules. It supports masking using MD5 hashing, fixed data values, data mapping (both from internal config and external mapping files) and flexible regular expression-based masking.

Features

The application processes CSV files and applies various masking rules to certain columns. The masking rules are defined in a YAML configuration file and can be applied in the following ways:

  1. fix_data: Sets a fixed value to a column.
  2. md5: Replaces the column value with its MD5 hash.
  3. mapping with internal config: Maps source values to target values for low cardinality data.
  4. mapping with external mapping file: Maps source values to target values using an external file for high cardinality data.
  5. regexp: Replaces parts of a column value that match a regular expression with a defined string.

Example

The following original CSV data:

column_name_01,column_name_02,column_name_03,column_name_04,column_name_05,column_name_06
test column 01,test column 11,test column 31,test column 41,test column 51,This is the first number 123-4567-890 from test
test column 02,test column 12,test column 32,test column 42,test column 52,This is the second number 234-5678-901 from test
test column 03,test column 13,test column 33,test column 43,test column 53,This is the third number 345-6789-012 from test
test column 01,test column 14,test column 34,test column 44,test column 54,This is the forth number 456-7890-123 from test
test column 02,test column 15,test column 35,test column 45,test column 55,This is the fifth number 567-8901-234 from test

Will be masked to the following output data:

column_name_01,column_name_02,column_name_03,column_name_04,column_name_05,column_name_06
219a05e3c24b3ccec4240d3710ad9a62,xxxxxxxxxx,test masked column data 31,test masked column data 41,test column 51,This is the first number ###-####-### from test
593a5976a62d421fa456042a0d704062,xxxxxxxxxx,Default,Default,test column 52,This is the second number ###-####-### from test
39f5546af039b94d8a2086dfb5d7e64b,xxxxxxxxxx,Default,Default,test column 53,This is the third number ###-####-### from test
219a05e3c24b3ccec4240d3710ad9a62,xxxxxxxxxx,test masked column data 34,test masked column data 44,test column 54,This is the forth number ###-####-### from test
593a5976a62d421fa456042a0d704062,xxxxxxxxxx,Default,Default,test column 55,This is the fifth number ###-####-### from test

Configuration

The masking rules are defined in a YAML configuration file. Below is an example config.yaml:

config:
  index_method: column
  mask_rules:
    column_name_01:
      mask_type: md5
    column_name_02:
      mask_type: fix_data
      value: "xxxxxxxxxx"
    column_name_03:
      mask_type: map
      mapping:
        "test column 31": "test masked column data 31"
        "test column 34": "test masked column data 34"
    column_name_04:
      mask_type: map
      mapping_file: config/column_mapping.yaml
    column_name_06:
      mask_type: regexp
      match_str: "[0-9]"
      replace_with: "#"

In the configuration:

  • column_name_01 is masked with the MD5 hash.
  • column_name_02 is replaced with the fixed value xxxxxxxxxx.
  • column_name_03 uses internal mapping from the config file.
  • column_name_04 uses external mapping defined in a file, e.g., config/column_mapping.yaml.
  • column_name_06 uses a regular expression to mask numbers with the # character.

Mapping File

An external mapping file is used for high cardinality data. Below is an example of the column_mapping.yaml:

"test column 41": "test masked column data 41"
"test column 44": "test masked column data 44"

Usage

The application accepts the following command-line arguments:

  • --config: Path to the YAML configuration file.
  • --data: Path to the CSV file to be masked.
  • --output: Directory where the masked CSV output will be saved.

Example Execution

Run the application with the following command:

./mask_data --config config/mapping.yaml --data data/test.csv --output output/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages