Skip to content

Configuration

Sylordis edited this page Feb 10, 2023 · 6 revisions

Configuration files define what operations are to be performed by the reorganiser, i.e. which columns will the target file have, with what content, and from which content of the source file. Each row of the source file will be processed separately but in order, from top to bottom, and output in the target file in the same order as the source file, transformed according to configuration. They are written in YAML and accept usual comments for this language.

The order of the columns in the source file does not matter as to how and when you process them in the configuration file.

You can find an example of configuration file in the source code (basic.yaml).

Configuration file structure

The root of every configuration file has to be the keyword structure, that will have for value an array of operations, each defined in their own array entries. Each defined operation will represent a column, in provided order, in the target file (see examples below).

Basic configuration file structure:

structure:
- operation1
- operation2
[..]

Operations (each entry) are defined like following (<..> are meant to be replaced):

column: <name>
operation:
  type: <type>
  <property1>: <value-property1>
  <property2>: <value-property2>
  <property3>: <value-property3>
  [..]
Parameter description
name Name of the operation and column in the header in the target file.
The name is output as is (case included) in the header of the target file and each operation/column name has to be unique.
type Name of operation (case insensitive), see defined operations below.
properties Properties of the operation (case sensitive), specific to each operation. see defined operations below.

Some operations have "shortcuts" that are defined that can be used to shorten the configuration file. Such shortcuts are defined on an operation basis and not all operations have them.

Operations

Please consult the Operations definitions page.

Example

Let's take the provided examples in the source code, with a fully detailed configuration:

detailed basic.yaml

structure:
- column: Name
  operation:
    type: get
    source: first_name
- column: Identifier
  operation:
    type: get
    source: id
- column: something
  operation:
    type: value
    value: s
- column: login
  operation:
    type: concat
    values:
      - first_name
      - .
      - last_name
- column: Network group
  operation:
    type: regreplace
    source: ip_address
    pattern: "([0-9]{1,3}\\.[0-9]{1,3})\\..*"
    replace: $1
- column: Gender
  operation:
    type: substring
    source: gender
    start: 0
    end: 1

What we have there is a configuration file defining a target CSV file with 6 columns:

  1. Name is a shortcut to Get operation that takes the column first_name from the source and outputs its content.
  2. Identifier follows the same pattern as Name, also being a shortcut to Get operation, taking content from source column id.
  3. something is a shortcut to Value operation that simply outputs the content s.
  4. Login is a concatenation (concat) operation. It concatenates first_name column, a static value . and last_name column. The resulting format will be <first_name>.<last_name>.
  5. Network group is a an operation regreplace that takes its base content from the source column ip_address and outputs only the first two number groups of the source.
  6. Gender is an operation substring, that takes its base content from source column gender and outputs only the first character.

Please take note that this configuration file, when using shortcuts, can be shortened to:

basic.yaml with shortcuts:

structure:
- column: Name
  source: first_name
- column: Identifier
  source: id
- column: something
  value: s
- column: login
  concat:
    - first_name
    - .
    - last_name
- column: Network group
  operation:
    type: regreplace
    source: ip_address
    pattern: "([0-9]{1,3}\\.[0-9]{1,3})\\..*"
    replace: $1
- column: Gender
  operation:
    type: substring
    source: gender
    start: 0
    end: 1

The resulting target CSV file will have the following header: Name,Identifier,something,Login,Network group,Gender.

If we take a source file that have the following content:

id,first_name,last_name,email,gender,ip_address
8,Ronny,MacLaverty,rmaclaverty7@usa.gov,Male,105.14.173.88

Applying the previous configuration through CSV reorganiser will result in the following target file:

# Generated by CSV reorganiser (09/04/2021, 18:05)
Name,Identifier,something,Login,Network group,Gender
Ronny,8,s,Ronny.MacLaverty,105.14,M

Note: Comment is automatically generated upon file generation with the timestamp of the beginning of the operation with format dd/mm/yyyy, hh:mm.

Provided examples are just extracts of the example files provided in the repository.

Running the software on examples/basic.csv, with examples/basic.yaml configuration, should produce examples/result.csv.

Clone this wiki locally