Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-119] [Bug] run_results.json schema validation error with oneOf constraint on status #4657

Closed
1 task done
indyyyyy opened this issue Feb 1, 2022 · 4 comments
Closed
1 task done
Labels
artifacts bug Something isn't working wontfix Not a bug or out of scope for dbt-core

Comments

@indyyyyy
Copy link

indyyyyy commented Feb 1, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Trying to validate the run_results.json artifact produced by the dbt test command, I used the jsonschema python package and the schema you expose here (run-results/v3.json).
The validation fails with the error below.

Expected Behavior

As the artifact and the json schema are produced by dbt, I expect the schema validation to be OK.
"oneOf" constraint seems to be unadapted as a status could appears in several enum.

Steps To Reproduce

  1. Run dbt test
  2. In a python3 console, run the following lines:
import jsonschema
import requests

dbt_runresults_schema = requests.get(url='https://schemas.getdbt.com/dbt/run-results/v3.json').json()
runresults_rawjson=json.load(open('path/to/target/run_results.json'), encoding="UTF-8"))

validate(instance=runresults_rawjson, schema=dbt_runresults_schema)

Relevant log output

>>> validate(instance=runresults_rawjson, schema=dbt_runresults_schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/jsonschema/validators.py", line 934, in validate
    raise error
jsonschema.exceptions.ValidationError: 'warn' is valid under each of {'type': 'string', 'enum': ['pass', 'warn', 'error', 'runtime error']}, {'type': 'string', 'enum': ['pass', 'error', 'fail', 'warn', 'skipped']}

Failed validating 'oneOf' in schema['properties']['results']['items']['properties']['status']:
    {'oneOf': [{'enum': ['success', 'error', 'skipped'], 'type': 'string'},
               {'enum': ['pass', 'error', 'fail', 'warn', 'skipped'],
                'type': 'string'},
               {'enum': ['pass', 'warn', 'error', 'runtime error'],
                'type': 'string'}]}

On instance['results'][0]['status']:
    'warn'


### Environment

```markdown
- OS:Debian GNU/Linux 10 (buster)
- Python:3.9.6
- dbt:0.21.1
- jsonschema:3.1.1

What database are you using dbt with?

redshift

Additional Context

As a workaround I changed in the schema the "oneOf" constraint of

"properties": {
        "status": {
          "oneOf": [
          ...

by a "anyOf" constraint which seems more relevant.

@indyyyyy indyyyyy added bug Something isn't working triage labels Feb 1, 2022
@github-actions github-actions bot changed the title [Bug] run_results.json schema validation error with oneOf constraint on status [CT-119] [Bug] run_results.json schema validation error with oneOf constraint on status Feb 1, 2022
@iknox-fa iknox-fa self-assigned this Feb 21, 2022
@iknox-fa
Copy link
Contributor

👋 @indyyyyy Sorry it's taken a min to get back to you on this. I was able to replicate the issue-- it seems our jsonschema could use some love. I'll categorize this as a bug and we'll get it prioritized ASAP.

@iknox-fa iknox-fa removed the triage label Feb 21, 2022
@iknox-fa
Copy link
Contributor

For future reference here's a slight update to fix some typos in the reproduction code posted above:

import jsonschema
import requests
import json

breakpoint()
dbt_runresults_schema = requests.get(url='https://schemas.getdbt.com/dbt/run-results/v3.json').json()
runresults_rawjson=json.load(open('/path/to/run_results.json'), encoding="UTF-8")

jsonschema.validate(instance=runresults_rawjson, schema=dbt_runresults_schema)

@jtcohen6 jtcohen6 added this to the v1.1 milestone Feb 21, 2022
@jtcohen6
Copy link
Contributor

We identified some limitations of the library (hologram) that we're using to generate JSONSchemas from internal dbt objects. As such, while we could make this fix as one-off, what we'd really need is a more rigorous conversion library and validation testing.

I hesitate to prioritize that larger effort over a continuing investment in structured logging, which we believe can offer a more powerful, reliable, and real-time metadata interface going forward.

@jtcohen6
Copy link
Contributor

Improvement to the accuracy of our jsonschema generation isn't something we're able to prioritize currently. We should document that the JSONSchemas at schemas.getdbt.com are better understood as descriptive of the information contained in the JSON artifacts, rather than usable for rigorous validation of every artifact. In the meantime, we'll continue discussion in #4617 about how to create some healthy distance between dbt-core's internal classes and the artifacts it produces for external consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifacts bug Something isn't working wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

3 participants