Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Low-Code CDK] Handle forward references in manifest #20893

Merged
merged 4 commits into from
Jan 11, 2023

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Dec 27, 2022

What

Allow users to organize yaml manifests with more flexibility, by handling forward references.

Closes #20503

How

Updates the ManifestReferenceResolver class to look up references by reading the value(s) at the referenced path in the manifest (rather than looking for values in previously evaluated keys).

Recommended reading order

  1. manifest_reference_resolver.py
  2. manifest_declarative_source.py
  3. custom_exceptions.py
  4. Tests

@clnoll clnoll requested a review from a team as a code owner December 27, 2022 23:44
@octavia-squidington-iv octavia-squidington-iv added the CDK Connector Development Kit label Dec 27, 2022
@clnoll clnoll force-pushed the cdk-handle-forward-references branch 2 times, most recently from 080e60f to e849cef Compare December 27, 2022 23:52
@clnoll clnoll requested review from brianjlai and girarda December 27, 2022 23:56
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm liking what you have so far and I think the code to parse the layers is more intuitive to understand so nice work! I had a few follow ups related to testing and how we invoke the factory a bit. Just some discussion points to flesh out rather than strict changes.

Also it might be worth it to run the validate Gradle command that Maxime wrote to verify. We should expect them to all pass (although one heads up about source-monday which has been acting a bit weird this week).

except (KeyError, IndexError):
raise UndefinedReferenceException(path, reference)

def _read_reference_value(self, ref: str, manifest_node: Mapping[str, Any]) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't reference self here, we can make this a static method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.

@@ -46,7 +46,7 @@

factory = DeclarativeComponentFactory()

resolver = ManifestReferenceResolver()
resolver = ManifestReferenceResolver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like its a little more confusing to define the class here which sort of turns resolver into a sort of alias of ManifestReferenceResolver. For simplicity, since we're going to be instantiating a new ManifestReferenceResolver per test, we can probably just get rid of this and in each test just do:

config = ManifestReferenceResolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense to me, updated.

@@ -65,7 +65,7 @@ def test_factory():
request_body_json:
body_offset: "{{ next_page_token['offset'] }}"
"""
config = resolver.preprocess_manifest(YamlDeclarativeSource._parse(content), {}, "")
config = resolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of the existing tests have an inline schema which require a forward reference. Can you add a test to verify that your changes work for the main use case we were implementing on this for? We would probably just adapt an existing test and add the inlineschema component that then references a schema object at the bottom of the text blob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Added.

if not isinstance(evaluated_ref, dict):
return evaluated_ref
else:
return evaluated_ref | evaluated_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this right, we get the references values, and then update with the values defined on this component. So the values defined on this component take precedence. If so, can we add a small comment saying that values defined on this node take precedence over the referenced values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right. Added a comment.

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

ref_tag = "$ref"

def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):
def __init__(self, manifest: Mapping[str, Any]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why we're going with the approach to have the set and the manifest defined on the factory itself, but it steel feels a little weird for them to be inserted into the factory during instantiation. When I think of a factory I see it as generic and the preprocess method would take input and emit output.

Granted we need to reference the manifest as we traverse the manifest so its probably easier to define it on the factory class itself.

The other thing that feels weird is that when the set() is defined at the start, we're very reliant on correctly popping off visited elements after traversal and that at the end of preprocess the visited set is cleared.

As part of the preprocess_manifest() either when starting or finishing a run, we could clear the set so we know on each attempt we start clean. And the same could be said for assigning self.manifest = in that method (Although we would probably still instantiate empty values during init?). I'm not married to this approach either, but I think its worth a discussion because it stuck out to me that we're coupling the manifest to the factory pretty tightly in the refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you grep for ManifestReferenceResolver we actually only instantiate it and do the preprocessing once - when ManifestDeclarativeSource is initialized, and before the component factory is invoked at all. I'm wondering if that changes the discomfort with initializing it with the manifest + visited set. From there on out, the fully resolved config is accessed via ManifestDeclarativeSource and no additional calls are made to preprocess_manifest. That said if it feels more natural we can certainly consider clearing out the visited set after processing, and go back to the original design where we just passed the manifest into preprocess_manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so it's not a hard blocker by any means since like you said it's only used once. It's just a little bit of an awkward design. I don't think this part of flow would change, but one of the risks would be if we ever moved to using the factory or resolver multiple times, if we don't pop every element off, we might end up with some unexpected results accidentally using a set w/ stale elements.

How much lift is it to pass the manifest to the resolver like we used to be doing? The part that I noticed in your new implementation was that we use the self.manifest field in _lookup_reference_value(). If the lift isn't too big to continue the pass in the manifest and start w/ a clean execution of the function every time then I think that feels more intuitive. That being said, if trying to accommodate this makes the function too confusing, I think it is fine to leave as you already had it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no trouble at all - the code was in that state in one iteration, but I'd modified it thinking that it would be a little cleaner this way since I was having to pass manifest around as an argument in so many places. But to protect future-us from issues if we use the resolver multiple times I'm happy to change it back!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this change.

Copy link
Contributor Author

@clnoll clnoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @brianjlai! I've addressed your comments and added some info around the instantiation of ManifestReferenceResolver.

@@ -46,7 +46,7 @@

factory = DeclarativeComponentFactory()

resolver = ManifestReferenceResolver()
resolver = ManifestReferenceResolver
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense to me, updated.

@@ -65,7 +65,7 @@ def test_factory():
request_body_json:
body_offset: "{{ next_page_token['offset'] }}"
"""
config = resolver.preprocess_manifest(YamlDeclarativeSource._parse(content), {}, "")
config = resolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Added.

except (KeyError, IndexError):
raise UndefinedReferenceException(path, reference)

def _read_reference_value(self, ref: str, manifest_node: Mapping[str, Any]) -> Any:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done.

if not isinstance(evaluated_ref, dict):
return evaluated_ref
else:
return evaluated_ref | evaluated_dict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right. Added a comment.

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

ref_tag = "$ref"

def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):
def __init__(self, manifest: Mapping[str, Any]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you grep for ManifestReferenceResolver we actually only instantiate it and do the preprocessing once - when ManifestDeclarativeSource is initialized, and before the component factory is invoked at all. I'm wondering if that changes the discomfort with initializing it with the manifest + visited set. From there on out, the fully resolved config is accessed via ManifestDeclarativeSource and no additional calls are made to preprocess_manifest. That said if it feels more natural we can certainly consider clearing out the visited set after processing, and go back to the original design where we just passed the manifest into preprocess_manifest.

@clnoll
Copy link
Contributor Author

clnoll commented Dec 31, 2022

@brianjlai forgot to ask - what is the gradle command you're suggesting?

@brianjlai
Copy link
Contributor

The command is here:
https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/build.gradle#L22-L25

It basically just attempts to run the spec command against all of the existing low code connectors to see if they can be parsed and validated successfully

@clnoll clnoll force-pushed the cdk-handle-forward-references branch 2 times, most recently from 8460813 to 7853b9f Compare January 10, 2023 14:51
@clnoll
Copy link
Contributor Author

clnoll commented Jan 10, 2023

Thanks @brianjlai, the gradle command passed.

@clnoll clnoll force-pushed the cdk-handle-forward-references branch from 9ddbb7d to a624c50 Compare January 10, 2023 15:38
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see it running through the validation script. Just a couple of small suggestions and nits. Nice work!

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

ref_tag = "$ref"

def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):

def preprocess_manifest(self, manifest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add type hints and outputs to the method signature

elif isinstance(node, list):
return [self._evaluate_node(v, manifest) for v in node]
elif isinstance(node, str) and node.startswith("*ref("):
if visited is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a bit more clear to instantiate the empty set and pass it as a parameter which when we first invoke the first _evaluate_node() rather than rely on this conditional block to set it up. Also given how its used, in the flow it feels more like a required parameter instead of defaulting to None:

return self._evaluate_node(manifest, manifest, set())

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianjlai sorry I overlooked these before merging! Good suggestions, I'll make these changes in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#21268)

@clnoll clnoll merged commit 74dec83 into master Jan 11, 2023
@clnoll clnoll deleted the cdk-handle-forward-references branch January 11, 2023 13:50
@clnoll clnoll restored the cdk-handle-forward-references branch January 11, 2023 16:53
@clnoll clnoll deleted the cdk-handle-forward-references branch January 11, 2023 18:56
jbfbell pushed a commit that referenced this pull request Jan 13, 2023
[Low-Code CDK] Handle forward references in manifest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Low-Code CDK] Support forward references in manifest
3 participants