-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration
Configuration files define what operations are to be performed by the reorganiser, i.e. which columns will the target file have, with what content, and from which content of the source file. Each row of the source file will be processed separately but in order, from top to bottom, and output in the target file in the same order as the source file, transformed according to configuration. They are written in YAML and accept usual comments for this language.
The order of the columns in the source file does not matter as to how and when you process them in the configuration file.
You can find an example of configuration file in the source code (basic.yaml).
The root of every configuration file has to be the keyword structure
, that will have for value an array of operations, each defined in their own array entries. Each defined operation will represent a column, in provided order, in the target file (see examples below).
Basic configuration file structure:
structure:
- operation1
- operation2
[..]
Operations (each entry) are defined like following (<..>
are meant to be replaced):
column: <name>
operation:
type: <type>
<property1>: <value-property1>
<property2>: <value-property2>
<property3>: <value-property3>
[..]
Parameter | description |
---|---|
name |
Name of the operation and column in the header in the target file. The name is output as is (case included) in the header of the target file and each operation/column name has to be unique. |
type |
Name of operation (case insensitive), see defined operations below. |
properties |
Properties of the operation (case sensitive), specific to each operation. see defined operations below. |
Some operations have "shortcuts" that are defined that can be used to shorten the configuration file. Such shortcuts are defined on an operation basis and not all operations have them.
Please consult the Operations definitions page.
Let's take the provided examples in the source code, with a fully detailed configuration:
detailed basic.yaml
structure:
- column: Name
operation:
type: get
source: first_name
- column: Identifier
operation:
type: get
source: id
- column: something
operation:
type: value
value: s
- column: login
operation:
type: concat
values:
- first_name
- .
- last_name
- column: Network group
operation:
type: regreplace
source: ip_address
pattern: "([0-9]{1,3}\\.[0-9]{1,3})\\..*"
replace: $1
- column: Gender
operation:
type: substring
source: gender
start: 0
end: 1
What we have there is a configuration file defining a target CSV file with 6 columns:
-
Name
is a shortcut toGet
operation that takes the columnfirst_name
from the source and outputs its content. -
Identifier
follows the same pattern asName
, also being a shortcut toGet
operation, taking content from source columnid
. -
something
is a shortcut toValue
operation that simply outputs the contents
. -
Login
is a concatenation (concat
) operation. It concatenatesfirst_name
column, a static value.
andlast_name
column. The resulting format will be<first_name>.<last_name>
. -
Network group
is a an operationregreplace
that takes its base content from the source columnip_address
and outputs only the first two number groups of the source. -
Gender
is an operationsubstring
, that takes its base content from source columngender
and outputs only the first character.
Please take note that this configuration file, when using shortcuts, can be shortened to:
basic.yaml with shortcuts:
structure:
- column: Name
source: first_name
- column: Identifier
source: id
- column: something
value: s
- column: login
concat:
- first_name
- .
- last_name
- column: Network group
operation:
type: regreplace
source: ip_address
pattern: "([0-9]{1,3}\\.[0-9]{1,3})\\..*"
replace: $1
- column: Gender
operation:
type: substring
source: gender
start: 0
end: 1
The resulting target CSV file will have the following header: Name,Identifier,something,Login,Network group,Gender
.
If we take a source file that have the following content:
id,first_name,last_name,email,gender,ip_address
8,Ronny,MacLaverty,rmaclaverty7@usa.gov,Male,105.14.173.88
Applying the previous configuration through CSV reorganiser will result in the following target file:
# Generated by CSV reorganiser (09/04/2021, 18:05)
Name,Identifier,something,Login,Network group,Gender
Ronny,8,s,Ronny.MacLaverty,105.14,M
Note: Comment is automatically generated upon file generation with the timestamp of the beginning of the operation with format dd/mm/yyyy, hh:mm
.
Provided examples are just extracts of the example files provided in the repository.
Running the software on examples/basic.csv, with examples/basic.yaml configuration, should produce examples/result.csv.