Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liste des vérifications effectuées lors du processus d'import #17

Closed
JulienCorny opened this issue Sep 30, 2019 · 5 comments
Closed
Labels
Information Informations diverses

Comments

@JulienCorny
Copy link
Contributor

Liste des vérifications incluses dans gn_module_import (travail en cours) :

(“checks and clean/corrects” : erreur détectée mais automatiquement corrigée par le module)

Name Description Step Type
digit_name File name has only digits upload error
no_data No data in the file upload error
extension_error File extension is not csv upload error
no_file No file detected upload error
empty No file sent upload error
long_name File name length > 100 characters upload error
max_size File size exceed file size allowed in configuration parameters upload error
source-error(goodtables lib) Data reading error because of not supported or inconsistent contents. csv error
format-error(goodtables lib) Data reading error because of incorrect format. csv error
encoding-error(goodtables lib) Data reading error because of an encoding problem. csv error
blank-header(goodtables lib) There is a blank header name. All cells in the header row must have a value. csv error
duplicate-header(goodtables lib) There are multiple columns with the same name. All column names must be unique. csv error
blank-row(goodtables lib) Rows must have at least one non-blank cell. csv error
duplicate-row(goodtables lib) Rows can't be duplicated. csv error
extra-value(goodtables lib) A row has more columns than the header. csv error
missing-value(goodtables lib) A row has less columns than the header. csv error
wrong id_dataset user not allowed to import data into this id_dataset load raw data to db error
file and column names cleaning remove whitespaces on string extremities‘’ instead of whitespacesremove special characters (not ‘’)decode in utf-8remove uppercases load row data to db checks and clean
psycopg2.errors.BadCopyFileFormat Error occurred probably because of a wrong separator provided by the user load row data to db error
missing_value Missing values in not nullable columns data cleaning error
incorrect_date Incorrect datetime type in datetime type column data cleaning error
incorrect_uuid Incorrect uuid type in uuid type column data cleaning error
incorrect_length String length exceed max length allowed for character varying column data cleaning error
incorrect_integer Incorrect integer type (or negative) in integer type columns data cleaning error
incorrect_cd_nom cd_nom provided does not exist in taxref data cleaning error
date_max not provided date_max not provided: set equal to date_min data cleaning checks and corrects
date_min > date_max date min > date_max data cleaning error
missing_uuid missing uuid in uuid type column data cleaning warning
unique_id_sinp missing column if unique_id_sinp column not provided :if generating uuid option checked : create unique_id_sinp column with uuid for each rowif generating uuid option not checked : no action data cleaning checks and corrects
unique_id_sinp missing values if unique_id_sinp column provided but contains missing values :if generating uuid option checked : create unique_id_sinp values when missingif genereating uuid option not checked : no action data cleaning checks and corrects
missing count_min value if missing count_min value, replace by default count_min value set in gn_import module configuration parameters data cleaning checks and corrects
missing count_max column if count_max column not provided, count_max = count_min data cleaning checks and correct
missing count_min column if count_min column not provided, create count_min column with values = to default count_min value data cleaning checks and corrects
missing count_max values if missing count_max values, count_max = count_min data cleaning checks and corrects
missing altitude_min and altitude_max columns if altitude columns not provided :if generating altitude option checked : calculates values and creates altitude columnsif generating altitude option not checked : keeps missing values data cleaning checks and corrects
missing altitude_min or altitude_max values if altitudes provided but contains missing altitude_min values : if generating altitude option checked : calculates missing altitudesif generating altitude option not checked : keeps missing values data cleaning checks and corrects
count_min > count_max count_min > count_max data cleaning error
entity_source_pk column missing entity_source_pk column not provided : create column and fill with gn_pk values data cleaning checks and corrects
entity_source_pk duplicated entity_source_pk value duplicated (must be unique) data cleaning error
incorrect_real Incorrect real type (e.g. longitude and latitude) data cleaning error
comma decimal separator if comma decimal separator detected for real type values (latitude and longitude), automatically replaced by a point. data cleaning checks and corrects
inconsitent_wgs84_value Inconsistent geographic coordinates : long : -180 180 / lat : -90 90 data cleaning error
inconsitent_lambert93_value Inconsistent geographic coordinates : x: 100000 1300000 / y: 6000000 7200000 data cleaning error
@DonovanMaillard
Copy link
Collaborator

A ajouter par la suite, controler l'inexistence d'un uuid_sinp à importer dans la synthèse pour ne pas importer un doublon.

@JulienCorny
Copy link
Contributor Author

ok je vais l'inclure rapidement, j'avais pas pensé à ce check

@DonovanMaillard DonovanMaillard added the Information Informations diverses label Dec 17, 2019
@DonovanMaillard
Copy link
Collaborator

DonovanMaillard commented Dec 17, 2019

L'erreur "duplicate row" n'est désormais plus bloquante, et rend simplement les lignes dupliquées invalides (bloquait l'import du fichier jusqu'à lors). Merci @JulienCorny

@camillemonchicourt
Copy link
Member

Les contrôles des erreurs ont été enrichis dans la 1.1.0.
Voir pour tous les lister dans la documentation.

@camillemonchicourt
Copy link
Member

Basculé dans la doc et mis à jour par la même occasion : https://github.com/PnX-SI/gn_module_import/blob/develop/docs/controls.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Information Informations diverses
Projects
None yet
Development

No branches or pull requests

3 participants