Tool to select or filter data of different types and in multiple dimensions.
The Data Selector
tool allows you to select a sample of the dataset, specifying a number of rows. It is possible to delete or keep columns. Another parameter allows the selection of data in a column according to its value(s).
The output result is saved in a .csv
file.
The usage is given as follows:
Usage: data-selector [OPTIONS] COMMAND [ARGS]...
Data selection interactive tool.
Options:
--help Show this message and exit.
Commands:
selector Tool to select Data to Keep/Delete
version Print the application version information
To use the command to select data in file:
Usage: data-selector selector [OPTIONS]
Tool to select Data to Keep/Delete
Options:
-i, --input FILE Data file to convert [required]
-out, --output TEXT name for the output files [required]
-f, --force Overwrite existing files
-s, --file_sep TEXT File separator (csv).
-row, --nb_rows INTEGER Number of rows to import from input_file.
-keep, --columns_to_keep FILE Path to file with columns to keep.
-delete, --columns_to_delete FILE
Path to file with columns to delete.
-values, --values_to_keep FILE Path to file with columns and data to keep.
--help Show this message and exit.
For the -keep and the -delete
option, the template is given below :
[
"<column#1>",
"<column#2>",
"<column#3>",
"<column#4>",
"<column#5>"
]
column#x are the columns you want to select/delete. Note that you can add as many columns as needed.
For the values
option, the template is given below :
{
"<column#1>":["<value#1>", "<value#2"],
"<column#2>":["<value#1>", "<value#2", "<value#3", "<value#4"],
"<column#3>":["<value#1>"]
}
column#x are the columns you want to select/delete. value is a list of the values you want to keep on this column. Note that you can add as many columns as needed.
Build a local docker image using the following command line:
docker build -t data-selector .
Once built, you can run the container locally with the following command line:
docker run -ti --rm -v <your_path>:/DATA data-selector selector -i DATA/<path_to_data> -out DATA/<out_name> -s <file_sep> -keep DATA/<path_to_select_columns> -delete DATA/<path_to_delete_columns> -values DATA/<path_to_select_data_columns>
-v allows to mount a volume and to use your local data on the docker environment.
your_path: Local directory where the data (data to be selected, and parameter json files) are stored
path_to_data: The name of the file to select data from (in the Directory).
out_name: The name you want to give to the output file.
file_sep: File separator of the input file.
path_to_select_columns: Path towards json parametrization file.
path_to_delete_columns: Path towards json parametrization file.
path_to_select_data_columns: Path towards json parametrization file.
Project is built by poetry.
poetry install
⚠️ Be sure to write code compliant with linters or else you'll be rejected by the CI. Code linting is performed by flake8.
poetry run flake8 --count --show-source --statistics
Static type check is performed by mypy.
poetry run mypy .
⚠️ Be sure to write tests that succeed or else you'll be rejected by the CI. Unit tests are performed by the pytest testing framework.
poetry run pytest -v
Please check out OKP4 health files :