Create validate_camtrapdp()
and helper functions to validate integrity of a Camtrap DP
#58
Labels
function:validate_camtrapdp
Function validate_camtrapdp()
Suggested in camtraptor July 2023 coding sprint
An important aspect before analysing or publishing data is to check whether the dataset does not contain any major integrity errors, such as missing dates, coordinates, values not meeting controlled vocabularies or relationships between tables not being correct. Although validation is possible with the Python software Frictionless Framework, for most users the returned error messages are hard to parse.
validate()
function could make use of a number ofcheck_
helper functions. Those helper functions could also be run by other functions, e.g. when updating data (#248).Suggestions for functions:
validate(package)
check_relations(package)
: relationships are validcheck_identifiers(package, "table name")
: IDs are uniquecheck_required(package, "table name")
: required fields are populatedcheck_vocabularies(package, "table name")
: values meet factor levels. Note thatread_resource()
/readr()
converts these to factors and might throwproblems()
check_data_types(package, "table name")
: note thatread_resource()
/readr()
will throwproblems()
but otherwise will do a best attempt at convertingcheck_timestamps(package, table name")
: has timezone, start <= end (specific to camtraptor, not a frictionless thing)check_durations(package)
: obs & media timestamps within deployment (specific to camtraptor not a frictionless thing)While it would be useful if these were functions of the frictionless R package, it might not be what we expect for camtraptor. Frictionless would have its validation run on resources (i.e. csv files + schemas), since returned data frames lose the connection with their schema, so it is not possible to validate for relationships or unique, as that information is lost. Camtraptor on the other hand, wants to validate the (already read) data frames.
The text was updated successfully, but these errors were encountered: