-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[low-code connectors] Extract datetime parser and handle %s format directive #15429
Conversation
This reverts commit 3c76c5a.
( | ||
"test_parse_date_datetime", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime objects are not valid input to the parse method anymore
"test_parse_date_datetime", | ||
datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), | ||
"%Y%m%d", | ||
"test_parse_timestamp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for parsing a timestamp to datetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I reading this correctly that we removed a test case? if so why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sherifnada yes. datetime.datetime objects are not valid inputs anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah makes sense
from airbyte_cdk.sources.declarative.datetime.datetime_parser import DatetimeParser | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are pasted from airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py
return comparator(cursor_date, default_date) | ||
|
||
def parse_date(self, date: Union[str, datetime.datetime]) -> datetime.datetime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method should not accept a datetime as argument.
@@ -170,14 +171,11 @@ def _partition_daterange(self, start, end, step: datetime.timedelta): | |||
return dates | |||
|
|||
def _get_date(self, cursor_value, default_date: datetime.datetime, comparator) -> datetime.datetime: | |||
cursor_date = self.parse_date(cursor_value or default_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to call parse_date because the cursor_value is already a datetime object
return str(int(dt.timestamp())) | ||
else: | ||
return dt.strftime(self.datetime_format) | ||
return self._parser.format(dt, self.datetime_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract logic to the parser so it can be reused
@@ -142,7 +144,12 @@ def stream_slices(self, sync_mode: SyncMode, stream_state: Mapping[str, Any]) -> | |||
|
|||
start_datetime = max(cursor_datetime, start_datetime) | |||
|
|||
state_date = self.parse_date(stream_state.get(self.cursor_field.eval(self.config, stream_state=stream_state))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this used to call self.parse_date
with a potential None
value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@girarda do you have an example of how the directive is unreliable?
@sherifnada you can run the tests locally from this commit 016cb69 There will be an assertion error
The expected value is 1609459200 (Friday, January 1, 2021 12:00:00 AM UTC) and the actual value is now 1609488000 (Friday, January 1, 2021 8:00:00 AM UTC (12:00:00 AM PST). The tests are fine in CI because the CI instances are running with timezone=UTC https://github.com/airbytehq/airbyte/runs/7768180690 (I reverted the commit) |
This reverts commit 016cb69.
"test_parse_date_datetime", | ||
datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), | ||
"%Y%m%d", | ||
"test_parse_timestamp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I reading this correctly that we removed a test case? if so why?
Maye worth adding in the docs that by default |
…rective (#15429) * fix parse * Revert "fix parse" This reverts commit 3c76c5a. * fix parse timestamp * extract datetime parser * remove print * use parser * top level docstring * rename variable * do not use timestamp() * Revert "do not use timestamp()" This reverts commit 016cb69. * update comment * bump cdk version * Update template
* greenhouse minus pagination * jobs * first substream * rename field * applications_demographics_answers_stream * interviews * All streams are implemented * fix check * fix spec * disable backward compatibility tests * disable backward compatibility tests * unit tests * definitions * only use config.json * bump version * expected records * delete stream classes * Handle extracting no records from root * handle missing keys * Remove unused field from JsonSchema (#15425) * few fixes from working with sendgrid * reset to master * only update the docstring * reset * 🎉 Source File - add support for custom encoding (#15293) * added support for custom encoding * fixed unit test for utf16 * updated docs * bumped connector version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * change query frequency to 1hour (#15499) * [low-code connectors]: Assert there are no custom top-level fields (#15489) * move components to definitions field * Also update the references * validate the top level fields and add version * raise exception on unknown fields * newline * unit tests * set version to 0.1.0 * newline * 🐞 Postgres source: fix bug in intermediate state emission (#15496) * Rename record counter * Rename method * Emit intermediate state after all cursor records * Emit intermediate state only when it is ready * Merge two checks * Add a testing message * Fix unit tests * Add one more testing record and add comments * Add test case for multiple records with the same cursor value * Revert irrelevant change * Add explanation in javadoc * Format code * Rename testing methods * Fix comment * Bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * 🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426) * wip diff modal test setup * starting storybook add * storybook working now * cleanup * aria labels * test syncmode string * 🎉 New Source: Hubplanner (#15521) * added hubplanner source connector * feat: added more streams to read from * cleaned up some unneeded integration tests * fix hubplanner schema * changes * update dockerfile * add seed and doc * update spec * run source spec seed file Co-authored-by: Ricky Renner <renner@amendllc.com> * Make it possible to specify normalization pod resources. (#15495) Today we are running into OOM exceptions with normalization. Normalization itself also inherits the destination's resource requirements. After work to bring destination memory usage down, this is no longer ideal, since most destinations use less memory than normalization needs. This PR makes it possible to specify the general resource the normalization pod is provided via env vars. Notes: - Add env vars. Default to the various job main container resources if these are not set. - Instead of using the destination's memory, use the normalization specify env vars. * [low-code connectors] Extract datetime parser and handle %s format directive (#15429) * fix parse * Revert "fix parse" This reverts commit 3c76c5a. * fix parse timestamp * extract datetime parser * remove print * use parser * top level docstring * rename variable * do not use timestamp() * Revert "do not use timestamp()" This reverts commit 016cb69. * update comment * bump cdk version * Update template * source-file-secure bump to 0.2.16 (#15528) * update Dockerfile version * update init to accept additional args * unit test sendgrid messages stream (#15331) * unit test sendgrid messages stream * reset * Update airbyte-integrations/connectors/source-sendgrid/unit_tests/unit_test.py Co-authored-by: Augustin <augustin.lafanechere@gmail.com> Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * record extractor interface * dpath extractor * docstring * 🎉 Source File: cache binary stream to file (#15501) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> * Docs: update posthog.md (#15541) * Source Stripe: implement slicing (#15292) * #45 oncall - source Stripe: implement slicing * #45 source stripe: upd changelog * #45 source stripe: upd changelog * #45 source stripe: make slice range configurable * #45 source stripe: move generating a single slice into a mixin * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * fix: revert extraEnv delition in values.yaml for bootloader (#15548) * fix: revert extraEnv delition in values.yaml for bootloader * add newline * SAT: backward compatibility - check that cursor fields were not changed (#15520) * Replace twttr repo to the root build.gradle (#15544) * Fixed bucket naming for S3 * replaced twttr repo to the root build.gradle * replaced twttr repo to the root build.gradle Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com> * Generate separate server endpoints per domain (#15513) * 🐛 Backward compatibility test: Don't fail on updating additionalProperties (#15532) * Source Recurly: adds `state_checkpoint_interval` to streams (#13685) * Add `state_checkpoint_interval` to Recurly stream * Bumpg Recurly source version to `0.4.1` * reset * use dpath * enable backward compatibility test * infer types * Revert "infer types" This reverts commit b4de8d6. * infer some of the types * some drying * more drying * auto-bump connector version [ci skip] Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> Co-authored-by: midavadim <midavadim@yahoo.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Xiaohan Song <xiaohan@airbyte.io> Co-authored-by: Liren Tu <tuliren.git@outlook.com> Co-authored-by: Teal Larson <LARSON.TEAL@GMAIL.COM> Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: Ricky Renner <renner@amendllc.com> Co-authored-by: Davin Chia <davinchia@gmail.com> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> Co-authored-by: Augustin <augustin.lafanechere@gmail.com> Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com> Co-authored-by: juliatournant <39640564+juliatournant@users.noreply.github.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com> Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com> Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io> Co-authored-by: Mohamed Magdy <mohamed.magdy@canary.is>
What
The "%s" formatting directive is unreliable because it does not take the timezone into consideration.
We should use
datetime.datetime.fromtimestamp
anddt.timestamp()
instead.How
DatetimeStreamSlicer.parse_date
so we can remove type checksRecommended reading order
airbyte-cdk/python/airbyte_cdk/sources/declarative/datetime/datetime_parser.py
airbyte-cdk/python/airbyte_cdk/sources/declarative/datetime/min_max_datetime.py
airbyte-cdk/python/airbyte_cdk/sources/declarative/stream_slicers/datetime_stream_slicer.py
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.