Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[low-code connectors] Extract datetime parser and handle %s format directive #15429

Merged
merged 13 commits into from
Aug 10, 2022

Conversation

girarda
Copy link
Contributor

@girarda girarda commented Aug 9, 2022

What

The "%s" formatting directive is unreliable because it does not take the timezone into consideration.
We should use datetime.datetime.fromtimestamp and dt.timestamp() instead.

How

  • Check if the formatting directive is "%s". If it is, then call the timestamp methods. Otherwise, proceed with standard formatting / parsing
  • Extract the logic into a reusable class so we don't need to duplicate it
  • cleanup the calls to DatetimeStreamSlicer.parse_date so we can remove type checks

Recommended reading order

  1. airbyte-cdk/python/airbyte_cdk/sources/declarative/datetime/datetime_parser.py
  2. airbyte-cdk/python/airbyte_cdk/sources/declarative/datetime/min_max_datetime.py
  3. airbyte-cdk/python/airbyte_cdk/sources/declarative/stream_slicers/datetime_stream_slicer.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@github-actions github-actions bot added the CDK Connector Development Kit label Aug 9, 2022
(
"test_parse_date_datetime",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime objects are not valid input to the parse method anymore

"test_parse_date_datetime",
datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc),
"%Y%m%d",
"test_parse_timestamp",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for parsing a timestamp to datetime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I reading this correctly that we removed a test case? if so why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada yes. datetime.datetime objects are not valid inputs anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah makes sense

from airbyte_cdk.sources.declarative.datetime.datetime_parser import DatetimeParser


@pytest.mark.parametrize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are pasted from airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py

return comparator(cursor_date, default_date)

def parse_date(self, date: Union[str, datetime.datetime]) -> datetime.datetime:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method should not accept a datetime as argument.

@@ -170,14 +171,11 @@ def _partition_daterange(self, start, end, step: datetime.timedelta):
return dates

def _get_date(self, cursor_value, default_date: datetime.datetime, comparator) -> datetime.datetime:
cursor_date = self.parse_date(cursor_value or default_date)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to call parse_date because the cursor_value is already a datetime object

return str(int(dt.timestamp()))
else:
return dt.strftime(self.datetime_format)
return self._parser.format(dt, self.datetime_format)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract logic to the parser so it can be reused

@girarda girarda changed the title fix parse [low-code connectors] Extract datetime parser and handle %s format directive Aug 9, 2022
@@ -142,7 +144,12 @@ def stream_slices(self, sync_mode: SyncMode, stream_state: Mapping[str, Any]) ->

start_datetime = max(cursor_datetime, start_datetime)

state_date = self.parse_date(stream_state.get(self.cursor_field.eval(self.config, stream_state=stream_state)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this used to call self.parse_date with a potential None value

@girarda girarda requested a review from brianjlai August 9, 2022 15:35
@girarda girarda marked this pull request as ready for review August 9, 2022 15:35
@girarda girarda requested a review from a team as a code owner August 9, 2022 15:35
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda do you have an example of how the directive is unreliable?

@girarda
Copy link
Contributor Author

girarda commented Aug 10, 2022

@sherifnada you can run the tests locally from this commit 016cb69

There will be an assertion error

E       AssertionError: assert '1609459200' == '1609488000'
E         - 1609488000
E         + 1609459200

The expected value is 1609459200 (Friday, January 1, 2021 12:00:00 AM UTC) and the actual value is now 1609488000 (Friday, January 1, 2021 8:00:00 AM UTC (12:00:00 AM PST).

The tests are fine in CI because the CI instances are running with timezone=UTC https://github.com/airbytehq/airbyte/runs/7768180690

(I reverted the commit)

"test_parse_date_datetime",
datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc),
"%Y%m%d",
"test_parse_timestamp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I reading this correctly that we removed a test case? if so why?

@sherifnada
Copy link
Contributor

Maye worth adding in the docs that by default %s is always UTC

@girarda
Copy link
Contributor Author

girarda commented Aug 10, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/2836327858
https://github.com/airbytehq/airbyte/actions/runs/2836327858

@girarda
Copy link
Contributor Author

girarda commented Aug 10, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/2836366577
https://github.com/airbytehq/airbyte/actions/runs/2836366577

@girarda girarda merged commit 29fafe2 into master Aug 10, 2022
@girarda girarda deleted the alex/datetimeFormatTimestamp branch August 10, 2022 23:35
girarda added a commit that referenced this pull request Aug 11, 2022
…rective (#15429)

* fix parse

* Revert "fix parse"

This reverts commit 3c76c5a.

* fix parse timestamp

* extract datetime parser

* remove print

* use parser

* top level docstring

* rename variable

* do not use timestamp()

* Revert "do not use timestamp()"

This reverts commit 016cb69.

* update comment

* bump cdk version

* Update template
girarda added a commit that referenced this pull request Aug 12, 2022
* greenhouse minus pagination

* jobs

* first substream

* rename field

* applications_demographics_answers_stream

* interviews

* All streams are implemented

* fix check

* fix spec

* disable backward compatibility tests

* disable backward compatibility tests

* unit tests

* definitions

* only use config.json

* bump version

* expected records

* delete stream classes

* Handle extracting no records from root

* handle missing keys

* Remove unused field from JsonSchema (#15425)

* few fixes from working with sendgrid

* reset to master

* only update the docstring

* reset

* 🎉 Source File - add support for custom encoding (#15293)

* added support for custom encoding

* fixed unit test for utf16

* updated docs

* bumped connector version

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* change query frequency to 1hour (#15499)

* [low-code connectors]: Assert there are no custom top-level fields (#15489)

* move components to definitions field

* Also update the references

* validate the top level fields and add version

* raise exception on unknown fields

* newline

* unit tests

* set version to 0.1.0

* newline

* 🐞 Postgres source: fix bug in intermediate state emission (#15496)

* Rename record counter

* Rename method

* Emit intermediate state after all cursor records

* Emit intermediate state only when it is ready

* Merge two checks

* Add a testing message

* Fix unit tests

* Add one more testing record and add comments

* Add test case for multiple records with the same cursor value

* Revert irrelevant change

* Add explanation in javadoc

* Format code

* Rename testing methods

* Fix comment

* Bump version

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426)

* wip diff modal test setup

* starting storybook add

* storybook working now

* cleanup

* aria labels

* test syncmode string

* 🎉 New Source: Hubplanner (#15521)

* added hubplanner source connector

* feat: added more streams to read from

* cleaned up some unneeded integration tests

* fix hubplanner schema

* changes

* update dockerfile

* add seed and doc

* update spec

* run source spec seed file

Co-authored-by: Ricky Renner <renner@amendllc.com>

* Make it possible to specify normalization pod resources. (#15495)

Today we are running into OOM exceptions with normalization. Normalization itself also inherits the destination's resource requirements. After work to bring destination memory usage down, this is no longer ideal, since most destinations use less memory than normalization needs.

This PR makes it possible to specify the general resource the normalization pod is provided via env vars.

Notes:
- Add env vars. Default to the various job main container resources if these are not set.
- Instead of using the destination's memory, use the normalization specify env vars.

* [low-code connectors] Extract datetime parser and handle %s format directive (#15429)

* fix parse

* Revert "fix parse"

This reverts commit 3c76c5a.

* fix parse timestamp

* extract datetime parser

* remove print

* use parser

* top level docstring

* rename variable

* do not use timestamp()

* Revert "do not use timestamp()"

This reverts commit 016cb69.

* update comment

* bump cdk version

* Update template

* source-file-secure bump to 0.2.16 (#15528)

* update Dockerfile version

* update init to accept additional args

* unit test sendgrid messages stream (#15331)

* unit test sendgrid messages stream

* reset

* Update airbyte-integrations/connectors/source-sendgrid/unit_tests/unit_test.py

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* record extractor interface

* dpath extractor

* docstring

* 🎉 Source File: cache binary stream to file (#15501)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Docs: update posthog.md (#15541)

* Source Stripe: implement slicing (#15292)

* #45 oncall - source Stripe: implement slicing

* #45 source stripe: upd changelog

* #45 source stripe: upd changelog

* #45 source stripe: make slice range configurable

* #45 source stripe: move generating a single slice into a mixin

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* fix: revert extraEnv delition in values.yaml for bootloader (#15548)

* fix: revert extraEnv delition in values.yaml for bootloader

* add newline

* SAT: backward compatibility - check that cursor fields were not changed (#15520)

* Replace twttr repo to the root build.gradle (#15544)

* Fixed bucket naming for S3

* replaced twttr repo to the root build.gradle

* replaced twttr repo to the root build.gradle

Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com>

* Generate separate server endpoints per domain (#15513)

* 🐛 Backward compatibility test: Don't fail on updating additionalProperties (#15532)

* Source Recurly: adds `state_checkpoint_interval` to streams (#13685)

* Add `state_checkpoint_interval` to Recurly stream

* Bumpg Recurly source version to `0.4.1`

* reset

* use dpath

* enable backward compatibility test

* infer types

* Revert "infer types"

This reverts commit b4de8d6.

* infer some of the types

* some drying

* more drying

* auto-bump connector version [ci skip]

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: midavadim <midavadim@yahoo.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: Xiaohan Song <xiaohan@airbyte.io>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
Co-authored-by: Teal Larson <LARSON.TEAL@GMAIL.COM>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Ricky Renner <renner@amendllc.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Augustin <augustin.lafanechere@gmail.com>
Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com>
Co-authored-by: juliatournant <39640564+juliatournant@users.noreply.github.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com>
Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com>
Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com>
Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io>
Co-authored-by: Mohamed Magdy <mohamed.magdy@canary.is>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants