-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecordSchemaValidator can resolve $ref schemas #19625
Conversation
airbyte-json-validation/src/main/java/io/airbyte/validation/json/JsonSchemaValidator.java
Outdated
Show resolved
Hide resolved
this.schemaValidatorsConfig = new SchemaValidatorsConfig(); | ||
// This URI just needs to point at any path in the same directory as /app/WellKnownTypes.json | ||
// It's required for the JsonSchema#validate method to resolve $ref correctly. | ||
this("file:///app/nonexistent_file.json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this BASE_URI
for? Could be good to have some explanation here, it's not obvious why we only want to expose this for tests given its current usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments - lmk if this is still unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, we pass a value so that the validation can generate local references that we ignore anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the references aren't ignored - see the new unit test where the validator does read the referenced file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left a couple suggestions. I don't think the docker would matter a lot given this file shouldn't change a lot. In general, I think it's a better practice to keep the copy of build artifacts closer to the end for better caching.
|
||
static { | ||
try { | ||
DEFAULT_BASE_URI = new URI("file:///app/nonexistent_file.json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use file:///app/WellKnownTypes.json
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make it clear that it doesn't need to be the exact file being referenced. E.g. if we later added a second file, we wouldn't need to update this URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the part that always feels a bit unusual is referencing a non existing file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. what if we appended the nonexistent_file.json
in the constructor? so callers would just need new JsonSchemaValidator(new URI("file:///app/"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump @gosusnp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that appending this in the constructor helps understanding that much. It probably boils down to why do we actually need to pass a full URI rather than just the actual base URI.
I'd say that it doesn't feel ideal, at the same time, it doesn't seem important enough to block this PR. The comments helps with the understanding.
this.schemaValidatorsConfig = new SchemaValidatorsConfig(); | ||
// This URI just needs to point at any path in the same directory as /app/WellKnownTypes.json | ||
// It's required for the JsonSchema#validate method to resolve $ref correctly. | ||
this("file:///app/nonexistent_file.json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, we pass a value so that the validation can generate local references that we ignore anyway?
…start EC2 runners (#20439) * Revert "Revert "RecordSchemaValidator can resolve $ref schemas (#19625)" (#20113)" This reverts commit 86f61a5. * just hardcode build? * sshable instance * pass arg for release oss only * also skip octavia + create PR * update ec2 runner * revert CI test changes * whoops * whoopswhoops
Wire up the RecordSchemaValidator with the new data types protocol typedefs. Specifically:
$ref
(the changes are necessary because otherwise it would requirefile:WellKnownTypes.json
, which breaks in Python schema validation)Existing behavior should remain unchanged.
Example jsonl file -> dev-null sync: (note that it correctly detects the
int
field as being the wrong type)This was the schema: (I ran this locally and edited the
connection
record manually)Reading order: