-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE #34755] do not propagate parameters on InlineSchemaLoader #34853
[ISSUE #34755] do not propagate parameters on InlineSchemaLoader #34853
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Small note on the PR title, since that eventually ends up as the subject line of the merge commit: je ne crois pas que «propage» soit le bon mot 😄 The change is small and straightforward and everything looks good based on reading the code changes and PR description; I'll test it out a bit locally before I smash the approve button, though. |
Your French is good enough for the title to say "propage" |
@@ -74,6 +74,8 @@ | |||
"SimpleRetriever.partition_router": "CustomPartitionRouter", | |||
} | |||
|
|||
_PROPAGATION_EXCLUSION_TYPES = {"InlineSchemaLoader"} # propagation of extra parameters leads to invalid JSON schemas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we should use InlineSchemaLoader.schema()["properties"]["type"]["enum"]
to make sure that if the enum changes, the dependency to this is explicit and this is updated as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean a wider range of impact though. For example, spec.connection_specification
would be affected by this. I don't see any parameters for this as well and it seems fine to say that there wouldn't ever be for the same reason we want to exclude InlineSchemaLoader.schema
though so I would be fine with this
UPDATE: I was running the tests without the
|
@@ -103,7 +105,7 @@ def propagate_types_and_parameters( | |||
propagated_component["type"] = found_type | |||
|
|||
# When there is no resolved type, we're not processing a component (likely a regular object) and don't need to propagate parameters | |||
if "type" not in propagated_component: | |||
if "type" not in propagated_component or propagated_component["type"] in _PROPAGATION_EXCLUSION_TYPES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor thought I had here is that we actually don't want to propagate things is the schema
field of the InlineSchemaLoader
as opposed to the whole schema loader.
And that is because the schema
field is not actually a component at all. The same for ohter compoents. For a field something like JSON body field we wouldn't want to accidentally inject all the params there.
I did some snooping and I think this original check is actually not doing what we want. Because even for the InlineSchemaLoader.schema
field, it has a type
field set to object
. So this condition never triggers. I haven't thoroughly vetted this, just re-running tests, but changing the condition to: if "type" not in propagated_component or propagated_component.get("type") == "object":
.
So then the InlineSchemaLoader
can still receive parameters, but it doesn't pass them to `schema. If this is indeed accurate, then I prefer this solution because it means less one-of logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. Let me dig down and see if there are edge cases for that solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feel like it makes sense. I added a comment to clarify the scope of this check
@ambirdsall I'm not sure I'm buying that. Let's do three scenarios:
We can define a couple of diffs from that. The two most important are between master to others. So:
If the issue is still there, I would assume that the issue is on master as well. However I don't see this error in the weekly run. Can you add logging to make sure you are running with the versions you assume you are working with? If I get bullet point repro steps, I can try to reproduce or pin point the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What
Addresses #34755
How
During component transformation, exclude type
InlineSchemaLoader
from the propagation.🚨 User Impact 🚨
Migration
TLDR: nothing to do to update sources that were currently affected by this.
Running the following command, we can see that only three connectors currently have both
$parameters
andInlineSchemaLoader
:source-coingecko-coins
I was very surprised to see that source-coingecko-coins did not have the issue while running
docker run -v $(pwd)/secrets:/data airbyte/source-coingecko-coins:0.1.0 discover --config /data/config.json
. However, when checking in the docker image usingdocker run --rm -it --entrypoint bash airbyte/source-coingecko-coins:0.1.0
, we can see that there is a mismatch between the current codecoingecko_coins.yaml
is old enough to rely on$options
instead of$parameters
. At this point in time, either propagation wasn't implemented or InlineSchema was excluded.source-gnews
The same logic from source-coingecko-coins apply to gnews as we can see with this file structure:
source-news-api
source-news-api is affected by the propagation but in a way that is non breaking as
name
,primary_key
andpath
seems to be ignored byjsonschema.Draft7Validator.check_schema(...)
Hence, even though the current schema didn't seem to cause issue, it'll be cleaned up.