[Low-Code CDK] Handle forward references in manifest #20893

clnoll · 2022-12-27T23:44:25Z

What

Allow users to organize yaml manifests with more flexibility, by handling forward references.

Closes #20503

How

Updates the ManifestReferenceResolver class to look up references by reading the value(s) at the referenced path in the manifest (rather than looking for values in previously evaluated keys).

Recommended reading order

manifest_reference_resolver.py
manifest_declarative_source.py
custom_exceptions.py
Tests

brianjlai

I'm liking what you have so far and I think the code to parse the layers is more intuitive to understand so nice work! I had a few follow ups related to testing and how we invoke the factory a bit. Just some discussion points to flesh out rather than strict changes.

Also it might be worth it to run the validate Gradle command that Maxime wrote to verify. We should expect them to all pass (although one heads up about source-monday which has been acting a bit weird this week).

brianjlai · 2022-12-30T07:51:49Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

+        except (KeyError, IndexError):
+            raise UndefinedReferenceException(path, reference)
+
+    def _read_reference_value(self, ref: str, manifest_node: Mapping[str, Any]) -> Any:


since we don't reference self here, we can make this a static method

brianjlai · 2022-12-30T07:54:58Z

airbyte-cdk/python/unit_tests/sources/declarative/test_factory.py

@@ -46,7 +46,7 @@

 factory = DeclarativeComponentFactory()

-resolver = ManifestReferenceResolver()
+resolver = ManifestReferenceResolver


I feel like its a little more confusing to define the class here which sort of turns resolver into a sort of alias of ManifestReferenceResolver. For simplicity, since we're going to be instantiating a new ManifestReferenceResolver per test, we can probably just get rid of this and in each test just do:

config = ManifestReferenceResolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()

👍 makes sense to me, updated.

brianjlai · 2022-12-30T07:58:36Z

airbyte-cdk/python/unit_tests/sources/declarative/test_factory.py

@@ -65,7 +65,7 @@ def test_factory():
      request_body_json:
        body_offset: "{{ next_page_token['offset'] }}"
    """
-    config = resolver.preprocess_manifest(YamlDeclarativeSource._parse(content), {}, "")
+    config = resolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()


I don't think any of the existing tests have an inline schema which require a forward reference. Can you add a test to verify that your changes work for the main use case we were implementing on this for? We would probably just adapt an existing test and add the inlineschema component that then references a schema object at the bottom of the text blob

Good call! Added.

brianjlai · 2022-12-30T08:05:25Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

+                if not isinstance(evaluated_ref, dict):
+                    return evaluated_ref
+                else:
+                    return evaluated_ref | evaluated_dict


If I'm understanding this right, we get the references values, and then update with the values defined on this component. So the values defined on this component take precedence. If so, can we add a small comment saying that values defined on this node take precedence over the referenced values.

Yep that's right. Added a comment.

brianjlai · 2022-12-30T08:19:32Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

    ref_tag = "$ref"

-    def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):
+    def __init__(self, manifest: Mapping[str, Any]):


I can see why we're going with the approach to have the set and the manifest defined on the factory itself, but it steel feels a little weird for them to be inserted into the factory during instantiation. When I think of a factory I see it as generic and the preprocess method would take input and emit output.

Granted we need to reference the manifest as we traverse the manifest so its probably easier to define it on the factory class itself.

The other thing that feels weird is that when the set() is defined at the start, we're very reliant on correctly popping off visited elements after traversal and that at the end of preprocess the visited set is cleared.

As part of the preprocess_manifest() either when starting or finishing a run, we could clear the set so we know on each attempt we start clean. And the same could be said for assigning self.manifest = in that method (Although we would probably still instantiate empty values during init?). I'm not married to this approach either, but I think its worth a discussion because it stuck out to me that we're coupling the manifest to the factory pretty tightly in the refactor.

So if you grep for ManifestReferenceResolver we actually only instantiate it and do the preprocessing once - when ManifestDeclarativeSource is initialized, and before the component factory is invoked at all. I'm wondering if that changes the discomfort with initializing it with the manifest + visited set. From there on out, the fully resolved config is accessed via ManifestDeclarativeSource and no additional calls are made to preprocess_manifest. That said if it feels more natural we can certainly consider clearing out the visited set after processing, and go back to the original design where we just passed the manifest into preprocess_manifest.

Right so it's not a hard blocker by any means since like you said it's only used once. It's just a little bit of an awkward design. I don't think this part of flow would change, but one of the risks would be if we ever moved to using the factory or resolver multiple times, if we don't pop every element off, we might end up with some unexpected results accidentally using a set w/ stale elements.

How much lift is it to pass the manifest to the resolver like we used to be doing? The part that I noticed in your new implementation was that we use the self.manifest field in _lookup_reference_value(). If the lift isn't too big to continue the pass in the manifest and start w/ a clean execution of the function every time then I think that feels more intuitive. That being said, if trying to accommodate this makes the function too confusing, I think it is fine to leave as you already had it.

It's no trouble at all - the code was in that state in one iteration, but I'd modified it thinking that it would be a little cleaner this way since I was having to pass manifest around as an argument in so many places. But to protect future-us from issues if we use the resolver multiple times I'm happy to change it back!

Pushed this change.

clnoll

Thanks for the review @brianjlai! I've addressed your comments and added some info around the instantiation of ManifestReferenceResolver.

clnoll · 2022-12-30T22:47:07Z

airbyte-cdk/python/unit_tests/sources/declarative/test_factory.py

@@ -46,7 +46,7 @@

 factory = DeclarativeComponentFactory()

-resolver = ManifestReferenceResolver()
+resolver = ManifestReferenceResolver


👍 makes sense to me, updated.

clnoll · 2022-12-30T23:19:56Z

airbyte-cdk/python/unit_tests/sources/declarative/test_factory.py

@@ -65,7 +65,7 @@ def test_factory():
      request_body_json:
        body_offset: "{{ next_page_token['offset'] }}"
    """
-    config = resolver.preprocess_manifest(YamlDeclarativeSource._parse(content), {}, "")
+    config = resolver(YamlDeclarativeSource._parse(content)).preprocess_manifest()


Good call! Added.

clnoll · 2022-12-30T23:20:50Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

+        except (KeyError, IndexError):
+            raise UndefinedReferenceException(path, reference)
+
+    def _read_reference_value(self, ref: str, manifest_node: Mapping[str, Any]) -> Any:


clnoll · 2022-12-30T23:22:33Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

+                if not isinstance(evaluated_ref, dict):
+                    return evaluated_ref
+                else:
+                    return evaluated_ref | evaluated_dict


Yep that's right. Added a comment.

clnoll · 2022-12-30T23:37:52Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

    ref_tag = "$ref"

-    def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):
+    def __init__(self, manifest: Mapping[str, Any]):


So if you grep for ManifestReferenceResolver we actually only instantiate it and do the preprocessing once - when ManifestDeclarativeSource is initialized, and before the component factory is invoked at all. I'm wondering if that changes the discomfort with initializing it with the manifest + visited set. From there on out, the fully resolved config is accessed via ManifestDeclarativeSource and no additional calls are made to preprocess_manifest. That said if it feels more natural we can certainly consider clearing out the visited set after processing, and go back to the original design where we just passed the manifest into preprocess_manifest.

clnoll · 2022-12-31T00:13:50Z

@brianjlai forgot to ask - what is the gradle command you're suggesting?

brianjlai · 2023-01-03T23:15:02Z

The command is here:
https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/build.gradle#L22-L25

It basically just attempts to run the spec command against all of the existing low code connectors to see if they can be parsed and validated successfully

…estReferenceResolver`

clnoll · 2023-01-10T15:00:59Z

Thanks @brianjlai, the gradle command passed.

…ifestReferenceResolver`

brianjlai

Glad to see it running through the validation script. Just a couple of small suggestions and nits. Nice work!

brianjlai · 2023-01-10T22:36:47Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

@@ -96,91 +96,102 @@ class ManifestReferenceResolver:

    ref_tag = "$ref"

-    def preprocess_manifest(self, manifest: Mapping[str, Any], evaluated_mapping: Mapping[str, Any], path: Union[str, Tuple[str]]):
-
+    def preprocess_manifest(self, manifest):


nit: can you add type hints and outputs to the method signature

brianjlai · 2023-01-10T22:42:57Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/manifest_reference_resolver.py

+        elif isinstance(node, list):
+            return [self._evaluate_node(v, manifest) for v in node]
+        elif isinstance(node, str) and node.startswith("*ref("):
+            if visited is None:


I think its a bit more clear to instantiate the empty set and pass it as a parameter which when we first invoke the first _evaluate_node() rather than rely on this conditional block to set it up. Also given how its used, in the flow it feels more like a required parameter instead of defaulting to None:

return self._evaluate_node(manifest, manifest, set())

What do you think?

@brianjlai sorry I overlooked these before merging! Good suggestions, I'll make these changes in a separate PR.

[Low-Code CDK] Handle forward references in manifest

clnoll requested a review from a team as a code owner December 27, 2022 23:44

octavia-squidington-iv added the CDK Connector Development Kit label Dec 27, 2022

clnoll force-pushed the cdk-handle-forward-references branch 2 times, most recently from 080e60f to e849cef Compare December 27, 2022 23:52

clnoll requested review from brianjlai and girarda December 27, 2022 23:56

brianjlai reviewed Dec 30, 2022

View reviewed changes

clnoll commented Dec 30, 2022

View reviewed changes

clnoll added 3 commits January 10, 2023 14:38

[Low-Code CDK] Handle forward references in manifest

07c1cd7

Updates per code review comments

a410c43

Pass manifest into preprocess_manifest instead of storing on `Manif…

e18ddf1

…estReferenceResolver`

clnoll force-pushed the cdk-handle-forward-references branch 2 times, most recently from 8460813 to 7853b9f Compare January 10, 2023 14:51

Create visited set during preprocessing instead of defining on `Man…

a624c50

…ifestReferenceResolver`

clnoll force-pushed the cdk-handle-forward-references branch from 9ddbb7d to a624c50 Compare January 10, 2023 15:38

brianjlai approved these changes Jan 10, 2023

View reviewed changes

clnoll merged commit 74dec83 into master Jan 11, 2023

clnoll deleted the cdk-handle-forward-references branch January 11, 2023 13:50

clnoll restored the cdk-handle-forward-references branch January 11, 2023 16:53

clnoll deleted the cdk-handle-forward-references branch January 11, 2023 18:56

clnoll mentioned this pull request Jan 11, 2023

[Low Code CDK] Add type annotations and cleanup usage of visited set #21268

Merged

octavia-squidington-iii mentioned this pull request Jan 12, 2023

Bump Airbyte version from 0.40.27 to 0.40.28 #21359

Merged

jbfbell pushed a commit that referenced this pull request Jan 13, 2023

[Low-Code CDK] Handle forward references in manifest (#20893)

b911717

[Low-Code CDK] Handle forward references in manifest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Low-Code CDK] Handle forward references in manifest #20893

[Low-Code CDK] Handle forward references in manifest #20893

clnoll commented Dec 27, 2022

brianjlai left a comment

brianjlai Dec 30, 2022

clnoll Dec 30, 2022

brianjlai Dec 30, 2022

clnoll Dec 30, 2022

brianjlai Dec 30, 2022

clnoll Dec 30, 2022

brianjlai Dec 30, 2022

clnoll Dec 30, 2022

brianjlai Dec 30, 2022

clnoll Dec 30, 2022

brianjlai Jan 3, 2023

clnoll Jan 9, 2023

clnoll Jan 10, 2023

clnoll left a comment

clnoll Dec 30, 2022

clnoll Dec 30, 2022

clnoll Dec 30, 2022

clnoll Dec 30, 2022

clnoll Dec 30, 2022

clnoll commented Dec 31, 2022

brianjlai commented Jan 3, 2023

clnoll commented Jan 10, 2023

brianjlai left a comment

brianjlai Jan 10, 2023

brianjlai Jan 10, 2023

clnoll Jan 11, 2023

clnoll Jan 11, 2023

[Low-Code CDK] Handle forward references in manifest #20893

[Low-Code CDK] Handle forward references in manifest #20893

Conversation

clnoll commented Dec 27, 2022

What

How

Recommended reading order

brianjlai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clnoll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clnoll commented Dec 31, 2022

brianjlai commented Jan 3, 2023

clnoll commented Jan 10, 2023

brianjlai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment