-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
working python source #533
Conversation
Working state example: config.json:
state.json:
|
making this available for review even though stripe tests need updating. |
print(log_message.serialize()) | ||
|
||
@dataclass | ||
class ConfigContainer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to explain what these fields are.
airbyte-integrations/base-python/airbyte_protocol/airbyte_protocol/__init__.py
Outdated
Show resolved
Hide resolved
sys.exit(0) | ||
elif cmd == "read": | ||
# todo: pass in state | ||
generator = source.read(logging, rendered_config_path) | ||
generator = source.read(log_line, config_container, parsed_args.catalog, parsed_args.state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have have a similar "container" for catalog and state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do this in a separate PR
logger(line, "ERROR") | ||
|
||
airbyte_streams = [] | ||
singer_catalog = singer_transform(json.loads(completed_process.stdout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only try to load the line if the line starts with {
so we don't have the risk of reading some debug logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most singer taps write all logs to stderr
for the discover command and the entire stdout
is expected to be readable as the catalog. I think it's probably fine to expect that behavior for now and adapt if we run into a case that violates that expectation. I imagine for that case we should be doing custom work at that source level to handle it.
...ger/exchangeratesapi/source/source_exchangeratesapi_singer/source_exchangeratesapi_singer.py
Outdated
Show resolved
Hide resolved
...-integration/java/io/airbyte/integration_tests/sources/SingerExchangeRatesApiSourceTest.java
Outdated
Show resolved
Hide resolved
...te-integrations/singer/stripe_abprotocol/source/source_stripe_singer/source_stripe_singer.py
Outdated
Show resolved
Hide resolved
90110fb
to
9342e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass on the python files, will do a second pass later. The usage is looking clean so far! Very exciting
airbyte-integrations/base-python/airbyte_protocol/airbyte_protocol/__init__.py
Show resolved
Hide resolved
@@ -76,10 +70,34 @@ def __init__(self): | |||
pass | |||
|
|||
# Iterator<AirbyteMessage> | |||
def read(self, config_object, rendered_config_path, state=None) -> Generator[AirbyteMessage, None, None]: | |||
def read(self, logger, config_container, catalog_path, state=None) -> Generator[AirbyteMessage, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will come in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is expected to be implemented by specific sources. There is no default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, makes sense
@@ -1,8 +1,10 @@ | |||
from typing import Generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are there two airbyte_protocol
directories? It's not clear to me what the difference between the two is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like standard structure for python packages:
pkg
pkg/setup.py
pkg/pkg
pkg/pkg/__init__.py
pkg/pkg/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, thanks for bearing with my un-pythonic ways :P
schema = source.discover(logging, rendered_config_path) | ||
print(schema.schema) | ||
catalog = source.discover(log_line, config_container) | ||
print(catalog.serialize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is print
the standard way of logging in python or is there a "slf4j" equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to dictate the precise format of what we're outputting, we don't want an slf4j
/logger equivalent.
It's debatable if we should use sys.stdout.write()
or print
, but print
just seemed easier (we do want newlines after everthing, and we it's easy to use the string conversions in some cases).
schema = stream.get("schema").get("properties") | ||
|
||
# todo: figure out how to serialize an object with an items key in python_jsonschema_objects | ||
if name == "subscriptions": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean we are blotting out parts of the catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated #530 to show what it needs to do to fix this. I've only seen this on Stripe so far but this is definitely something we need to fix before release. We can do it outside of this PR.
@staticmethod | ||
def discover(shell_command, transform=(lambda x: AirbyteSchema(x))) -> AirbyteSchema: | ||
def get_catalogs(logger, shell_command, singer_transform=(lambda catalog: catalog), airbyte_transform=(lambda catalog: catalog)) -> Catalogs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is singer_transform and airbyte_transform used for?
|
||
ok = True | ||
while ok: | ||
for key, val1 in sel.select(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does select
break if the underlying stream has closed? From the select
docs: If timeout is None, the call will block until a monitored file object becomes ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't break when the underlying process ends. The final iteration of this loop is when the file object p.stdout
is already closed, and it exits gracefully by reaching the ok = False
condition.
airbyte-integrations/singer/base-singer/base_singer/singer_helpers.py
Outdated
Show resolved
Hide resolved
|
||
if err_line: | ||
log_line(err_line, "ERROR") | ||
def combine_catalogs(masked_airbyte_catalog, discovered_singer_catalog) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the output here expected to be an object or a Singer or Airbytecatalog? if so can the type signature reflect that? Also, this is more of making selections on the discovered catalog rather than "combining" the two catalogs right? Can we rename the method to reflect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, can you add a unit test for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the rename here. I'll do the documentation piece and type handling separately.
A few of these comments can be summarized as:
I'll address these in a separate PR after this is merged. |
recommended reading order: