Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Stripe: fix multiple BankAccounts issues #32146

Merged
merged 12 commits into from
Nov 13, 2023

Conversation

davydov-d
Copy link
Collaborator

@davydov-d davydov-d commented Nov 3, 2023

What

This PR addresses multiple issues described in https://github.com/airbytehq/oncall/issues/3398 and #31555 that are related to lazy substreams, mostly the BankAccounts stream:

  • 500 error <lambda>() missing 1 required keyword-only argument: 'stream_slice'
  • Data not expanded when using the BankAccounts stream in the full refresh mode
  • Data not filtered when using the event-based incremental sync mode
  • Cursor values not filled when using the full refresh mode

They are all fixed in a single PR because they all depend one on another.

How

  • Reorder function params of some lambdas like path and extra_request_params
  • Instantiate the Customers stream twice - first to be an independent stream, second to be a parent for the BankAccounts stream and expand the requested data
  • Replace a FilteringRecordExtractor with a response_filter callable that is now passed into both IncrementalExtractor and DefaultExtractor so that records are filtered no matter what the sync mode is.
  • Override the record extractor of the UpdatedCursorIncrementalStripeLazySubStream so that cursor value is filled in the full refresh sync mode as well
  • Cover these changes with unit tests

Copy link

vercel bot commented Nov 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Nov 10, 2023 8:42am

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/stripe labels Nov 3, 2023
Copy link
Contributor

github-actions bot commented Nov 3, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like what I see. I would like to see a little bit more coverage to document these cases and make sure we have a safety net in case of regression

"object": "list",
"total_count": 3,
"url": "/v1/invoices/in_1KD6OVIEn5WyEQxn9xuASHsD/lines",
lazy_substream_test_suite = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! It feels like this is filling a big testing gap we have for streams where the dataset is hard/impossible to maintain.

The PR addresses the case where the events stream records were not filtered properly. Should we test that using this kind of tests?

I would like to push the idea further: for each stream_cls, do we want to have one test for incremental and one for full_refresh? The reason I'm asking is that the code path is very different for both. For example, bank_accounts relies on the events endpoint on incremental and on the customer else. One other crazy particularity with stripe is the pagination given StripeLazySubStream (for stream invoice_line_items for example) which seems to be a very specific and interesting case to document/test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxi297 all these cases are already covered.

Each of this cases tests:

  • Request args. The test would fail if the URL param values do not match those in the expected URL
  • Record filtering - the number of actual records should be the same as the number of expected records and it is not always the number of records in the mocked response
  • Cursor value - it is populated in the expected records, so if missing, the test would fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we test per python classes and not per Stripe stream. I think that can be enough for now. What I'm worried about is that I don't see a safety net for a couple of changes you've done. For example:

  • If we have an event that is not filtered properly
  • If a dev removes parent=expand_items=["data.sources"] on bank_accounts

Do you think it's worth testing that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxi297 what do you mean by event that is not filtered properly? Did we have such an issue?
Regarding your second concern - I do think we do not have to cover this as we no more implement a class per stream and this is an input option for instantiating a class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are looking at the same lines of code with two different perspectives that are not mutually exclusive but do not produce the same lines of code for the tests.

I do think we do not have to cover this as we no more implement a class per stream and this is an input option for instantiating a class

This is true if we check the interface of the classes in isolation. So if we check the tests for UpdatedCursorIncrementalStripeLazySubStream, this case is covered as you mention; Unit tests are checking that just fine. Does that mean the source is working as we would like it to work? The answer is "no" as the code was failing in production and the stream BankAccounts was missing records in production. Tomorrow, I could remove this line inadvertently and I would have no safety net to tell my I did something wrong because this is not caught on the unit test level or the CATs level. Hence, it feels like we need another safety net.

what do you mean by event that is not filtered properly? Did we have such an issue?

I think we has as the PR mentions Data not filtered when using the event-based incremental sync mode. There was indeed a change made on filtering on the event stream. By looking more closely at the different cases, I see that this entry would be filtered so the case is covered. What I fear is that the purpose of the test is not clear (it tests many things like that it perform the right query for incremental, filtering works, etc...) and as it isn't explicit, I can remove lines 255 to 262 from the test and it would still pass and still be a valid test. It's very hard to be too explicit but things are often too implicit.

As a more generic comments, tests have many many goals. We are pretty good in using tests for validation explicitly written code in a specific project. This is good as it ensure that all our small units of code work as expected. However, I think we should use tests for other reasons:

  • Ensuring the integration between our component works as expected (hence the first point)
  • Documenting the code (hence my second point)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxi297 tests updated: Now we are testing not classes, but source streams. I have also split the generic test into three separate tests to verify that
a. cursor value is populated,
b. data is filtered
c. data is expanded
for both sync modes (if applicable)

The PR has also been split into two different

@octavia-squidington-iv octavia-squidington-iv requested a review from a team November 7, 2023 17:38
@davydov-d davydov-d changed the title 🐛 Source Stripe: fix multiple BankAccount issues 🐛 Source Stripe: fix BankAccounts, Refunds, CheckoutSessions and CheckoutSessionsLineItems issues Nov 7, 2023
@davydov-d
Copy link
Collaborator Author

davydov-d commented Nov 7, 2023

@maxi297 I have also merged in another patch regarding different issue: https://github.com/airbytehq/oncall/issues/3428. Changes may look huge but both updates share the same codebase so I think it's worth doing in a single PR. The PR description is updated, changes are covered with tests, please review once more when you have a chance

@davydov-d davydov-d added the breaking-change Don't merge me unless you are ready. label Nov 7, 2023
@davydov-d davydov-d changed the title 🐛 Source Stripe: fix BankAccounts, Refunds, CheckoutSessions and CheckoutSessionsLineItems issues 🚨 🚨 Source Stripe: fix BankAccounts, Refunds, CheckoutSessions and CheckoutSessionsLineItems issues Nov 7, 2023
@maxi297
Copy link
Contributor

maxi297 commented Nov 7, 2023

There are 8 changes all within one PR that modifies adds 875 lines of code and removes 433 lines. It is too tedious to find how each of the changes are tested to ensure the proper execution of the source. The fact that a test can test many things at the same time and that there are no significant names on those tests makes things even harder.

I can't review this change effectively. I've tried but it's already been an hour and I'm not done. Please split this into different PR (one for each change) so that I can see the impact of the changes more easily.

@davydov-d davydov-d force-pushed the ddavydov/3398-oncall-bugfix branch 2 times, most recently from aecc74d to e4a1eae Compare November 8, 2023 08:35
@davydov-d davydov-d changed the title 🚨 🚨 Source Stripe: fix BankAccounts, Refunds, CheckoutSessions and CheckoutSessionsLineItems issues 🐛 Source Stripe: fix multiple BankAccounts issues Nov 8, 2023
@davydov-d davydov-d removed the breaking-change Don't merge me unless you are ready. label Nov 8, 2023
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good. There's only one thing I can't find out. My understanding is that this line was changed to fix 500 error <lambda>() missing 1 required keyword-only argument: 'stream_slice'. Is this right? If so, should we have a test for subscription_items to ensure we have a safety net and this does not happen again?

@pytest.fixture()
def stream_by_name(config):
def mocker(stream_name, source_config=config):
source = SourceStripe()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that because it uses the streams as configured by the source so not only it tests the streams but also how the source instantiate them

requests_mock.get(
"https://api.stripe.com/v1/invoices",
json={
bank_accounts_full_refresh_test_case = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests Data not expanded when using the BankAccounts stream in the full refresh mode 👍

{"id": "il_2", "invoice_id": "in_1KD6OVIEn5WyEQxn9xuASHsD", "object": "line_item"},
{"id": "il_3", "invoice_id": "in_1KD6OVIEn5WyEQxn9xuASHsD", "object": "line_item"},
]
bank_accounts_incremental_test_case = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests Data not filtered when using the event-based incremental sync mode 👍

@davydov-d
Copy link
Collaborator Author

This is very good. There's only one thing I can't find out. My understanding is that this line was changed to fix 500 error <lambda>() missing 1 required keyword-only argument: 'stream_slice'. Is this right? If so, should we have a test for subscription_items to ensure we have a safety net and this does not happen again?

@maxi297 Completely forgot about it, thanks. Tests updated again

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test! :shipit:

@davydov-d davydov-d merged commit a05a293 into master Nov 13, 2023
18 checks passed
@davydov-d davydov-d deleted the ddavydov/3398-oncall-bugfix branch November 13, 2023 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation checklist-action-run connectors/source/stripe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants