-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream returns AirbyteMessages #18572
Conversation
yield message | ||
if message.type == MessageType.RECORD: | ||
record = message.record | ||
stream_state = stream_instance.get_updated_state(stream_state, record.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same logic moved inside the if
block because we only want to update the state and record counter on records
@@ -99,6 +116,7 @@ def read_records( | |||
This method should be overridden by subclasses to read records based on the inputs | |||
""" | |||
|
|||
@lru_cache(maxsize=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -425,8 +425,8 @@ def test_source_config_no_transform(abstract_source, catalog): | |||
records = [r for r in abstract_source.read(logger=logger_mock, config={}, catalog=catalog, state={})] | |||
assert len(records) == 2 * 5 | |||
assert [r.record.data for r in records] == [{"value": 23}] * 2 * 5 | |||
assert http_stream.get_json_schema.call_count == 1 | |||
assert non_http_stream.get_json_schema.call_count == 1 | |||
assert http_stream.get_json_schema.call_count == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assert verified that we used the cached transformer and schema instead of calling get_json_schema
.
We now call get_json_schema
more often, but the value is cached so the behavior is effectively the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! but side question, could we add a test into core.py
that calls get_json_schema()
and asserts that some part of:
ResourceSchemaLoader(package_name_from_class(self.__class__)).get_schema(self.name)
only gets invoked once. like given that it's cached, we'd only expect self.__class__
or the underlying get_schema()
to only get called once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 0cbd28e
This reverts commit 5dac7c2.
# AirbyteRecordMessage: An AirbyteRecordMessage | ||
# AirbyteLogMessage: A log message | ||
# AirbyteTraceMessage: A trace message | ||
StreamData = Union[Mapping[str, Any], AirbyteRecordMessage, AirbyteLogMessage, AirbyteTraceMessage] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not attached to this typedef, but the full union definition is quite cumbersome
stream_instance = self._stream_to_instance_map[stream_name] | ||
return stream_instance.transformer, stream_instance.get_json_schema() | ||
|
||
def _as_airbyte_record(self, stream_name: str, data: Mapping[str, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracted to a function so it can be reused by the SimpleRetriever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
from airbyte_cdk.sources.utils.transform import TypeTransformer | ||
from airbyte_cdk.sources.utils.record_helper import \ | ||
stream_data_to_airbyte_message | ||
from airbyte_cdk.sources.utils.schema_helpers import InternalConfig, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did a formatter change? i don't recall seeing this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. Ran gradlew format. I'll look into what's wrong with my setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
…nent * master: 🪟 🎉 Enable frontend validation for <1hr syncs (cloud) #19028 Stream returns AirbyteMessages (#18572) 🎉 New Source - Recruitee [low-code SDK] (#18671) 🎉 New source: Breezometer [low-code cdk] (#18650) Check disabled connections after protocol update (#18990) Simple default replication worker refactor (#19002) 🎉 New Source: Visma e-conomic (#18595) 🎉 New Source: Fastbill (#18593) Bmoric/extract state api (#18980) 🎉 New Source: Zapier Supported Storage (#18442) 🎉 New source: Klarna (#18385) `AirbyteEstimateTraceMessage` (#18875) Extract source definition api (#18977) [low-code cdk] Allow for spec file to be defined in the yaml manifest instead of an external file (#18411) 🐛 Source HubSpot: fix property scopes (#18624) Ensure that only 6-character hex values are passed to monaco editor (#18943)
What
AirbyteMessage
s.AirbyteMessage
instead of Mapping.HttpStream
so requests and responses are emitted asAirbyteLogMessage
sStream
interface to yieldAirbyteMessage
How
read_records_as_messages
to theStream
intefaceread_records_as_messages
simply callsread_records_as_messages
instead ofread_records
read_records
isn't modified so child classes can still overwrite is when neededTesting
Recommended reading order
airbyte-cdk/python/airbyte_cdk/sources/streams/core.py
airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py
airbyte-cdk/python/airbyte_cdk/sources/utils/record_helper.py
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.