Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates: Adding Checks and Resolving Inconsistencies #251

Merged
merged 65 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
33de8e5
Update README.md
daniellegroenen Jan 5, 2023
63371d2
Update README.md
daniellegroenen Jan 5, 2023
01cd682
Update README.md
daniellegroenen Jan 5, 2023
88383d8
Update README.md
daniellegroenen Jan 5, 2023
0d0820a
added license_url_description_check field to rule_mapping
Apr 17, 2023
a904c5a
Added check message for license url description check
Apr 17, 2023
0b8e731
Added horizontal_resolution_presence_check to the rule_mapping and ch…
Apr 18, 2023
9b6172f
fixed field path
Apr 19, 2023
c3bab5c
Standard Product Check added
smk0033 Apr 19, 2023
8dee88d
rule_mapping file for FreeAndOpenData check
Apr 20, 2023
90b18e6
added FreeAndOpenData check to check_messages
Apr 20, 2023
1e331d6
updated ends at present flag check to account for text values of False
Apr 21, 2023
a93a595
fixed bugs for ends at presence flag check
Apr 24, 2023
b8069e7
Work around and log invalid token & cmr response
slesaad Apr 24, 2023
f3594d2
Format
slesaad Apr 24, 2023
49fba92
Update tests
slesaad Apr 24, 2023
b3c7ac4
Add `details` to downloader error
slesaad Apr 24, 2023
3923bb9
Update tests
slesaad Apr 24, 2023
57b3046
Merge pull request #235 from NASA-IMPACT/feature-add_new_checks-jw
svbagwell Apr 26, 2023
083d32e
added specific DIF10 check for standard product
May 1, 2023
bae889c
updated output messages for standard product fails
May 1, 2023
5e9d21c
Merge pull request #239 from NASA-IMPACT/feature-add_new_checks_shey
svbagwell May 1, 2023
4e5d2a1
Merge pull request #237 from NASA-IMPACT/feature-add_new_checks-sydney
svbagwell May 1, 2023
214a27e
changed description fields to licensetext fields for license informat…
May 2, 2023
75716a7
added URL fields to license URL description check
May 2, 2023
df870c9
added a check for license URL description
May 2, 2023
d317704
Update pyQuARC/schemas/rule_mapping.json
svbagwell May 4, 2023
ca0b2e4
updated echo-10 schema
May 4, 2023
46fed0a
updated umm-c schema
May 5, 2023
ac774a5
updated dif10 schema
May 5, 2023
3abb0b3
updated the license description check
May 5, 2023
8639ca2
Merge branch 'feature-add_new_checks-shelby' of https://github.com/NA…
May 5, 2023
8c991e2
fixed field path
May 5, 2023
a6fb9d9
updated check function
May 5, 2023
9ca2837
Merge pull request #238 from NASA-IMPACT/bug_fixes-ends_at_present_flag
svbagwell May 19, 2023
d5a9627
Merge branch 'feature-add_new_checks' into feature-add_new_checks-shelby
May 19, 2023
7ccd425
Merge pull request #236 from NASA-IMPACT/feature-add_new_checks-shelby
svbagwell May 19, 2023
5d6e42c
Create dependabot.yml
code-geek Jun 13, 2023
3c3239c
Update instrument long name presence check
esr0004 Jul 24, 2023
2593fd8
Update platform long name presence check
esr0004 Jul 24, 2023
55dba37
Update campaign long name presence check
esr0004 Jul 24, 2023
a105e27
Merge pull request #243 from NASA-IMPACT/feature-add_new_checks
jenny-m-wood Jul 24, 2023
89ee8de
Merge pull request #242 from NASA-IMPACT/bug_fixes
jenny-m-wood Jul 24, 2023
2f9de0c
Merge pull request #241 from NASA-IMPACT/schema_updates
jenny-m-wood Jul 24, 2023
40da310
Merge pull request #232 from NASA-IMPACT/daniellegroenen-readme-edits
jenny-m-wood Jul 24, 2023
cbf204f
Merge pull request #246 from NASA-IMPACT/feature-add_new_checks-essence
jenny-m-wood Jul 24, 2023
40004bc
Merge pull request #247 from NASA-IMPACT/feature-add_new_checks
jenny-m-wood Jul 24, 2023
0bb815b
Added Granule Campaign Name Presence Check
Jul 25, 2023
178d4c5
Removed online_access_url_description_check since the url_desc_presen…
Jul 25, 2023
de0c5dd
Removed online_resource_url_description_check since the url_desc_pres…
Jul 25, 2023
74be283
Merge pull request #248 from NASA-IMPACT/dev-jenny
jenny-m-wood Jul 25, 2023
10b8195
Removed data_center_short_name_gcmd_check since the organization_shor…
Jul 25, 2023
8ee9946
Modified rule mapping for collection_citation_presence_check
Jul 25, 2023
a2f6112
Resolving issues with the validate_beginning_datetime_against_granule…
Jul 25, 2023
ab470ce
Merge pull request #249 from NASA-IMPACT/dev-jenny
jenny-m-wood Jul 25, 2023
589a6ad
Added data_format_presence_check for collections
Jul 27, 2023
5db98b8
Removed echo-g rule mapping from online_resource_url_presence_check
Jul 28, 2023
e6341f4
Merge pull request #250 from NASA-IMPACT/dev-jenny
jenny-m-wood Aug 1, 2023
e22cb0d
Merge branch 'dev' into fix/auth-token-issue
slesaad Aug 1, 2023
c01d7f4
Add dependabot configuration for pip that runs on a weekly basis
slesaad Aug 1, 2023
ad43634
Fix auth token issue
slesaad Aug 1, 2023
7e50166
Remove else: pass from dif_standard_product_check
jenny-m-wood Aug 2, 2023
8a3e77a
Update changelog
jenny-m-wood Aug 2, 2023
cdea94e
Update version.txt
jenny-m-wood Aug 2, 2023
8b6136c
Add auth issue fix to changelog
slesaad Aug 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
version: 2
updates:
# Enable version updates for pip
- package-ecosystem: "pip" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ build/*
dist/*
pyQuARC.egg-info/*
env/*
.venv/*
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# CHANGELOG

## v1.2.3
- Updated schema files
- Added Free And Open Data check
- Added Horizontal Resolution Presence check
- Added Data Format Presence check
- Added Standard Product check
- Added License URL Description check
- Added Granule Campaign Name Presence check
- Revised GCMD long name presence checks
- Revised validate_beginning_datetime_against_granules check
- Removed redundant checks
- Fix auth issue when downloading metadata files

## v1.2.2

- Bugfixes:
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The CMR is designed around its own metadata standard called the [Unified Metadat

pyQuARC supports DIF10 (collection only), ECHO10 (collection and granule), UMM-C, and UMM-G standards. At this time, there are no plans to add ISO 19115 or UMM-S/T specific checks. **Additionally, the output messages pyQuARC currently displays should be taken with a grain of salt. There is still testing and clean-up work to be done.**

**For inquiries, please email: jeanne.leroux@nsstc.uah.edu**
**For inquiries, please email: jenny.wood@uah.edu**

## pyQuARC as a Service (QuARC)

Expand All @@ -53,7 +53,7 @@ The `checks.json` file includes a comprehensive list of rules. Each rule is spec

The `rule_mapping.json` file specifies which metadata element(s) each rule applies to. The `rule_mapping.json` also references the `messages.json` file which includes messages that can be displayed when a check passes or fails.

Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “<span style="color:red">error</span>,” “<span style="color:orange">warning</span>,” or <span style="color:blue">info</span>.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs.
Furthermore, the `rule_mapping.json` file specifies the level of severity associated with a failure. If a check fails, it will be assigned a severity category of “<span style="color:red">error</span>”, “<span style="color:orange">warning</span>”, or "<span style="color:blue">info</span>.” These categories correspond to priority categorizations in [ARC’s priority matrix](https://wiki.earthdata.nasa.gov/display/CMR/ARC+Priority+Matrix) and communicate the importance of the failed check, with “error” being the most critical category, “warning” indicating a failure of medium priority, and “info” indicating a minor issue or inconsistency. Default severity values are assigned based on ARC’s metadata quality assessment framework, but can be customized to meet individual needs.

## Customization
pyQuARC is designed to be customizable. Output messages can be modified using the `messages_override.json` file - any messages added to `messages_override.json` will display over the default messages in the `message.json` file. Similarly, there is a `rule_mapping_override.json` file which can be used to override the default settings for which rules/checks are applied to which metadata elements.
Expand Down Expand Up @@ -317,7 +317,7 @@ Then, if the check function receives input `value1=0` and `value2=1`, the output
The values 0 and 1 do not amount to a true value
```

### Use as a package
### Using as a package
*Note:* This program requires `Python 3.8` installed in your system.

**Clone the repo:** [https://github.com/NASA-IMPACT/pyQuARC/](https://github.com/NASA-IMPACT/pyQuARC/)
Expand Down
4 changes: 2 additions & 2 deletions pyQuARC/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
with open(f"{ABS_PATH}/version.txt") as version_file:
__version__ = version_file.read().strip()


def version():
"""Returns the current version of pyQuARC.
"""
"""Returns the current version of pyQuARC."""
return __version__
2 changes: 1 addition & 1 deletion pyQuARC/code/base_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def contains(list_of_values, value):

@staticmethod
def compare(first, second, relation):
if relation.startswith('not_'):
if relation.startswith("not_"):
return not (BaseValidator.compare(first, second, relation[4:]))
func = getattr(BaseValidator, relation)
return func(first, second)
72 changes: 33 additions & 39 deletions pyQuARC/code/checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def __init__(
metadata_format=ECHO10_C,
messages_override=None,
checks_override=None,
rules_override=None
rules_override=None,
):
"""
Args:
Expand All @@ -53,13 +53,13 @@ def __init__(
self.rules_override,
self.checks,
self.checks_override,
metadata_format=metadata_format
metadata_format=metadata_format,
)
self.schema_validator = SchemaValidator(
self.messages_override or self.messages, metadata_format
)
self.schema_validator = SchemaValidator(self.messages_override or self.messages, metadata_format)
self.tracker = Tracker(
self.rule_mapping,
self.rules_override,
metadata_format=metadata_format
self.rule_mapping, self.rules_override, metadata_format=metadata_format
)

@staticmethod
Expand All @@ -76,15 +76,9 @@ def load_schemas(self):
self.checks = Checker._json_load_schema("checks")
self.rule_mapping = Checker._json_load_schema("rule_mapping")
self.messages = Checker._json_load_schema("check_messages")
self.messages_override = Checker._json_load_schema(
self.msgs_override_file
)
self.rules_override = Checker._json_load_schema(
self.rules_override_file
)
self.checks_override = Checker._json_load_schema(
self.checks_override_file
)
self.messages_override = Checker._json_load_schema(self.msgs_override_file)
self.rules_override = Checker._json_load_schema(self.rules_override_file)
self.checks_override = Checker._json_load_schema(self.checks_override_file)

@staticmethod
def map_to_function(data_type, function):
Expand Down Expand Up @@ -112,19 +106,19 @@ def message(self, rule_id, msg_type):
msg_type can be any one of 'failure', 'remediation'
"""
messages = self.messages_override.get(rule_id) or self.messages.get(rule_id)
return messages[msg_type] if messages else ''
return messages[msg_type] if messages else ""

def build_message(self, result, rule_id):
"""
Formats the message for `rule_id` based on the result
"""
failure_message = self.message(rule_id, "failure")
rule_mapping = self.rules_override.get(
rule_mapping = self.rules_override.get(rule_id) or self.rule_mapping.get(
rule_id
) or self.rule_mapping.get(rule_id)
)
severity = rule_mapping.get("severity", "error")
messages = []
if not(result["valid"]) and result.get("value"):
if not (result["valid"]) and result.get("value"):
for value in result["value"]:
formatted_message = failure_message
value = value if isinstance(value, tuple) else (value,)
Expand All @@ -143,7 +137,9 @@ def _check_dependency_validity(self, dependency, field_dict):
"""
Checks if the dependent check called `dependency` is valid
"""
dependency_fields = field_dict["fields"] if len(dependency) == 1 else [dependency[1]]
dependency_fields = (
field_dict["fields"] if len(dependency) == 1 else [dependency[1]]
)
for field in dependency_fields:
if not self.tracker.read_data(dependency[0], field).get("valid"):
return False
Expand All @@ -162,27 +158,26 @@ def _run_func(self, func, check, rule_id, metadata_content, result_dict):
"""
Run the check function for `rule_id` and update `result_dict`
"""
rule_mapping = self.rules_override.get(
rule_mapping = self.rules_override.get(rule_id) or self.rule_mapping.get(
rule_id
) or self.rule_mapping.get(rule_id)
)
external_data = rule_mapping.get("data", [])
relation = rule_mapping.get("relation")
list_of_fields_to_apply = \
rule_mapping.get("fields_to_apply").get(self.metadata_format, {})

list_of_fields_to_apply = rule_mapping.get("fields_to_apply").get(
self.metadata_format, {}
)

for field_dict in list_of_fields_to_apply:
dependencies = self.scheduler.get_all_dependencies(rule_mapping, check, field_dict)
dependencies = self.scheduler.get_all_dependencies(
rule_mapping, check, field_dict
)
main_field = field_dict["fields"][0]
external_data = field_dict.get("data", external_data)
result_dict.setdefault(main_field, {})
if not self._check_dependencies_validity(dependencies, field_dict):
continue
result = self.custom_checker.run(
func,
metadata_content,
field_dict,
external_data,
relation
func, metadata_content, field_dict, external_data, relation
)

self.tracker.update_data(rule_id, main_field, result["valid"])
Expand Down Expand Up @@ -211,14 +206,16 @@ def perform_custom_checks(self, metadata_content):
) or self.rule_mapping.get(rule_id)
check_id = rule_mapping.get("check_id", rule_id)
check = self.checks_override.get(check_id) or self.checks.get(check_id)
func = Checker.map_to_function(check["data_type"], check["check_function"])
func = Checker.map_to_function(
check["data_type"], check["check_function"]
)
if func:
self._run_func(func, check, rule_id, metadata_content, result_dict)
except Exception as e:
pyquarc_errors.append(
{
"message": f"Running check for the rule: '{rule_id}' failed.",
"details": str(e)
"details": str(e),
}
)
return result_dict, pyquarc_errors
Expand All @@ -233,6 +230,7 @@ def run(self, metadata_content):
Returns:
(dict): The results of the jsonschema check and all custom checks
"""

def _xml_postprocessor(_, key, value):
"""
Sometimes the XML values contain attributes.
Expand All @@ -259,11 +257,7 @@ def _xml_postprocessor(_, key, value):
parser = parse
kwargs = {"postprocessor": _xml_postprocessor}
json_metadata = parser(metadata_content, **kwargs)
result_schema = self.perform_schema_check(
metadata_content
)
result_schema = self.perform_schema_check(metadata_content)
result_custom, pyquarc_errors = self.perform_custom_checks(json_metadata)
result = {
**result_schema, **result_custom
}
result = {**result_schema, **result_custom}
return result, pyquarc_errors
16 changes: 8 additions & 8 deletions pyQuARC/code/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

ROOT_DIR = (
# go up one directory
os.path.abspath(os.path.join(__file__, '../..'))
os.path.abspath(os.path.join(__file__, "../.."))
)

SCHEMAS_BASE_PATH = f"{ROOT_DIR}/schemas"
Expand Down Expand Up @@ -46,17 +46,17 @@
"rules_override",
f"{UMM_C}-json-schema",
"umm-cmn-json-schema",
f"{UMM_G}-json-schema"
f"{UMM_G}-json-schema",
],
"csv": GCMD_KEYWORDS,
"xsd": [ f"{DIF}_schema", f"{ECHO10_C}_schema", f"{ECHO10_G}_schema" ],
"xml": [ "catalog" ]
"xsd": [f"{DIF}_schema", f"{ECHO10_C}_schema", f"{ECHO10_G}_schema"],
"xml": ["catalog"],
}

SCHEMA_PATHS = {
schema: f"{SCHEMAS_BASE_PATH}/{schema}.{filetype}"
for filetype, schemas in SCHEMAS.items()
for schema in schemas
schema: f"{SCHEMAS_BASE_PATH}/{schema}.{filetype}"
for filetype, schemas in SCHEMAS.items()
for schema in schemas
}

VERSION_FILE = f"{SCHEMAS_BASE_PATH}/version.txt"
Expand All @@ -67,7 +67,7 @@
"error": Fore.RED,
"warning": Fore.YELLOW,
"reset": Style.RESET_ALL,
"bright": Style.BRIGHT
"bright": Style.BRIGHT,
}

GCMD_BASIC_URL = "https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/"
Expand Down
52 changes: 36 additions & 16 deletions pyQuARC/code/custom_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ def __init__(self):
pass

@staticmethod
def _get_path_value_recursively(subset_of_metadata_content, path_list, container, query_params=None):
def _get_path_value_recursively(
subset_of_metadata_content, path_list, container, query_params=None
):
"""
Gets the path values recursively while handling list or dictionary in `subset_of_metadata_content`
Adds the values to `container`
Expand All @@ -37,7 +39,11 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container
container.append(subset_of_metadata_content)
return
new_path = path_list[1:]
if isinstance(root_content, str) or isinstance(root_content, int) or isinstance(root_content, float):
if (
isinstance(root_content, str)
or isinstance(root_content, int)
or isinstance(root_content, float)
):
container.append(root_content)
return
elif isinstance(root_content, list):
Expand All @@ -46,7 +52,13 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container
return
if len(new_path) == 1 and query_params:
try:
root_content = next((x for x in root_content if x[query_params[0]] == query_params[1]))
root_content = next(
(
x
for x in root_content
if x[query_params[0]] == query_params[1]
)
)
root_content = root_content[new_path[0]]
container.append(root_content)
except:
Expand All @@ -55,13 +67,15 @@ def _get_path_value_recursively(subset_of_metadata_content, path_list, container
for each in root_content:
try:
CustomChecker._get_path_value_recursively(
each, new_path, container, query_params)
each, new_path, container, query_params
)
except KeyError:
container.append(None)
continue
elif isinstance(root_content, dict):
CustomChecker._get_path_value_recursively(
root_content, new_path, container, query_params)
root_content, new_path, container, query_params
)

@staticmethod
def _get_path_value(content_to_validate, path_string):
Expand All @@ -80,15 +94,18 @@ def _get_path_value(content_to_validate, path_string):
query_params = None

parsed = urlparse(path_string)
path = parsed.path.split('/')
path = parsed.path.split("/")
if key_value := parsed.query:
query_params = key_value.split('=')
query_params = key_value.split("=")

CustomChecker._get_path_value_recursively(
content_to_validate, path, container, query_params)
content_to_validate, path, container, query_params
)
return container

def run(self, func, content_to_validate, field_dict, external_data, external_relation):
def run(
self, func, content_to_validate, field_dict, external_data, external_relation
):
"""
Runs the custom check based on `func` to the `content_to_validate`'s `field_dict` path

Expand All @@ -112,22 +129,25 @@ def run(self, func, content_to_validate, field_dict, external_data, external_rel
fields = field_dict["fields"]
field_values = []
relation = field_dict.get("relation")
result = {
"valid": None
}
result = {"valid": None}
for _field in fields:
value = CustomChecker._get_path_value(
content_to_validate, _field)
value = CustomChecker._get_path_value(content_to_validate, _field)
field_values.append(value)
args = zip(*field_values)

invalid_values = []
validity = None
for arg in args:
function_args = [*arg]
function_args.extend([extra_arg for extra_arg in [relation, *external_data, external_relation] if extra_arg])
function_args.extend(
[
extra_arg
for extra_arg in [relation, *external_data, external_relation]
if extra_arg
]
)
func_return = func(*function_args)
valid = func_return["valid"] # can be True, False or None
valid = func_return["valid"] # can be True, False or None
if valid is not None:
if valid:
validity = validity or (validity is None)
Expand Down
Loading