You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To declare a spec, a connector developer often places a spec.json file in the connector module directory (example). The CDK automatically looks for a spec.json file and uses that as the spec where possible. Specs are most commonly created in JSON (sometimes in Pydantic), which is a bit more cumbersome to work with than YAML.
Describe the solution you’d like
I want the CDK to support declaring specs as YAMLs.
implementation
Code changes
add pyyaml to the CDK's dependencies in setup.py and add import yaml in files you're doing yaml parsing
in connector.py look for a .yaml file, prioritizing it over a .json file. If it is found, read the YAML file. You'll probably also want to log a warning or raise an exception to the developer if you find both a .json and a .yaml, letting them know that YAML takes precedence and that they really shouldn't use both simultaneously.
Testing
The current function (which looks for a .json to read by default) is not unit tested for no good reason. As part of this ticket, we should add unit tests in test_connector.py to verify all these behaviors. It may be a good idea to do this first (a la TDD) before doing the refactor to support YAMLs.
Docs changes
Edit this piece of the docs to indicate that it should be possible to use YAML files. Although, it might be better to move the section regarding spec under the Source section, as that behavior (autoreading json/yaml specs) also applies to the Source class.
Publish the new CDK version
instructions in README or here. Make sure to update the changelog.md file!
Update a single downstream consumer
Make a PR to any connector (let's say source-stripe) bumping its CDK dependency and changing its spec to be a YAML, then publish a new version of that connector
Create a new convention that in addition to dropping a spec.json file in the connector's module directory, the developer could instead declare a spec.yaml file which works exactly the same way but uses YAML. By default, the CDK should first look for a spec.yaml and if not found, a spec.json.
Update the CDK docs to inform the user about this (i may have missed a couple more references to spec in the docs, make sure to look around to see if we need to update more docs)
create follow-up tickets to change the code generator and tutorials to use YAMLs instead of JSONs
Follow up work
Do we want to update all connectors to use YAML instead of JSON? we can do this completely programmatically?
airbytehq/airbyte-internal-issues#557
Describe the alternative you’ve considered or used
PyDantic is also an option. We could declare a spec.py file which leverages pydantic, which would also be nice.
The text was updated successfully, but these errors were encountered:
+1 on YAML! Way easier to read and way more conventional for IaC stuff and dbt people, etc. I assumed there was some reason JSON was being used. Would this not be applicable to all connectors, not just CDK?
Dont know much about pydantic but seems like it would be harder for the Java core of airbyte to work with if there was ever a refactor where core needed to pull spec info from these files. Not familiar enough with all the places spec's are used.
Tell us about the problem you're trying to solve
To declare a spec, a connector developer often places a
spec.json
file in the connector module directory (example). The CDK automatically looks for aspec.json
file and uses that as the spec where possible. Specs are most commonly created in JSON (sometimes in Pydantic), which is a bit more cumbersome to work with than YAML.Describe the solution you’d like
I want the CDK to support declaring specs as YAMLs.
implementation
Code changes
pyyaml
to the CDK's dependencies insetup.py
and addimport yaml
in files you're doing yaml parsing.yaml
file, prioritizing it over a.json
file. If it is found, read the YAML file. You'll probably also want to log a warning or raise an exception to the developer if you find both a.json
and a.yaml
, letting them know that YAML takes precedence and that they really shouldn't use both simultaneously.Testing
The current function (which looks for a .json to read by default) is not unit tested for no good reason. As part of this ticket, we should add unit tests in
test_connector.py
to verify all these behaviors. It may be a good idea to do this first (a la TDD) before doing the refactor to support YAMLs.Docs changes
Edit this piece of the docs to indicate that it should be possible to use YAML files. Although, it might be better to move the section regarding
spec
under theSource
section, as that behavior (autoreading json/yaml specs) also applies to theSource
class.Publish the new CDK version
instructions in README or here. Make sure to update the changelog.md file!
Update a single downstream consumer
Make a PR to any connector (let's say
source-stripe
) bumping its CDK dependency and changing its spec to be a YAML, then publish a new version of that connectorAcceptance criteria:
spec.yaml
instead ofjson
#11935spec.json
file in the connector's module directory, the developer could instead declare aspec.yaml
file which works exactly the same way but uses YAML. By default, the CDK should first look for aspec.yaml
and if not found, aspec.json
.Follow up work
Describe the alternative you’ve considered or used
PyDantic is also an option. We could declare a
spec.py
file which leverages pydantic, which would also be nice.The text was updated successfully, but these errors were encountered: