-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Stripe: fix multiple BankAccounts issues #32146
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like what I see. I would like to see a little bit more coverage to document these cases and make sure we have a safety net in case of regression
"object": "list", | ||
"total_count": 3, | ||
"url": "/v1/invoices/in_1KD6OVIEn5WyEQxn9xuASHsD/lines", | ||
lazy_substream_test_suite = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot! It feels like this is filling a big testing gap we have for streams where the dataset is hard/impossible to maintain.
The PR addresses the case where the events stream records were not filtered properly. Should we test that using this kind of tests?
I would like to push the idea further: for each stream_cls
, do we want to have one test for incremental and one for full_refresh? The reason I'm asking is that the code path is very different for both. For example, bank_accounts
relies on the events
endpoint on incremental and on the customer else. One other crazy particularity with stripe is the pagination given StripeLazySubStream
(for stream invoice_line_items
for example) which seems to be a very specific and interesting case to document/test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxi297 all these cases are already covered.
- Full refresh + pagination for Invoice Line Items (StripeLazySubStream)
- Full refresh + pagination for Subscription Items (StripeLazySubstream)
- Full refresh + pagination for Bank Accounts (UpdatedCursorIncrementalStripeLazySubstream)
- Full refresh + pagination for Application Fees Refunds (UpdatedCursorIncrementalStripeLazySubstream)
- Incremental for Bank Accounts (UpdatedCursorIncrementalStripeLazySubstream)
- Incremental for Application Fees Refunds (UpdatedCursorIncrementalStripeLazySubStream)
Each of this cases tests:
- Request args. The test would fail if the URL param values do not match those in the expected URL
- Record filtering - the number of actual records should be the same as the number of expected records and it is not always the number of records in the mocked response
- Cursor value - it is populated in the expected records, so if missing, the test would fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so we test per python classes and not per Stripe stream. I think that can be enough for now. What I'm worried about is that I don't see a safety net for a couple of changes you've done. For example:
- If we have an event that is not filtered properly
- If a dev removes
parent=expand_items=["data.sources"]
on bank_accounts
Do you think it's worth testing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxi297 what do you mean by event that is not filtered properly? Did we have such an issue?
Regarding your second concern - I do think we do not have to cover this as we no more implement a class per stream and this is an input option for instantiating a class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are looking at the same lines of code with two different perspectives that are not mutually exclusive but do not produce the same lines of code for the tests.
I do think we do not have to cover this as we no more implement a class per stream and this is an input option for instantiating a class
This is true if we check the interface of the classes in isolation. So if we check the tests for UpdatedCursorIncrementalStripeLazySubStream
, this case is covered as you mention; Unit tests are checking that just fine. Does that mean the source is working as we would like it to work? The answer is "no" as the code was failing in production and the stream BankAccounts was missing records in production. Tomorrow, I could remove this line inadvertently and I would have no safety net to tell my I did something wrong because this is not caught on the unit test level or the CATs level. Hence, it feels like we need another safety net.
what do you mean by event that is not filtered properly? Did we have such an issue?
I think we has as the PR mentions Data not filtered when using the event-based incremental sync mode
. There was indeed a change made on filtering on the event stream. By looking more closely at the different cases, I see that this entry would be filtered so the case is covered. What I fear is that the purpose of the test is not clear (it tests many things like that it perform the right query for incremental, filtering works, etc...) and as it isn't explicit, I can remove lines 255 to 262 from the test and it would still pass and still be a valid test. It's very hard to be too explicit but things are often too implicit.
As a more generic comments, tests have many many goals. We are pretty good in using tests for validation explicitly written code in a specific project. This is good as it ensure that all our small units of code work as expected. However, I think we should use tests for other reasons:
- Ensuring the integration between our component works as expected (hence the first point)
- Documenting the code (hence my second point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxi297 tests updated: Now we are testing not classes, but source streams. I have also split the generic test into three separate tests to verify that
a. cursor value is populated,
b. data is filtered
c. data is expanded
for both sync modes (if applicable)
The PR has also been split into two different
@maxi297 I have also merged in another patch regarding different issue: https://github.com/airbytehq/oncall/issues/3428. Changes may look huge but both updates share the same codebase so I think it's worth doing in a single PR. The PR description is updated, changes are covered with tests, please review once more when you have a chance |
There are 8 changes all within one PR that modifies adds 875 lines of code and removes 433 lines. It is too tedious to find how each of the changes are tested to ensure the proper execution of the source. The fact that a test can test many things at the same time and that there are no significant names on those tests makes things even harder. I can't review this change effectively. I've tried but it's already been an hour and I'm not done. Please split this into different PR (one for each change) so that I can see the impact of the changes more easily. |
aecc74d
to
e4a1eae
Compare
…rbyte into ddavydov/3398-oncall-bugfix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good. There's only one thing I can't find out. My understanding is that this line was changed to fix 500 error <lambda>() missing 1 required keyword-only argument: 'stream_slice'
. Is this right? If so, should we have a test for subscription_items
to ensure we have a safety net and this does not happen again?
@pytest.fixture() | ||
def stream_by_name(config): | ||
def mocker(stream_name, source_config=config): | ||
source = SourceStripe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that because it uses the streams as configured by the source so not only it tests the streams but also how the source instantiate them
requests_mock.get( | ||
"https://api.stripe.com/v1/invoices", | ||
json={ | ||
bank_accounts_full_refresh_test_case = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests Data not expanded when using the BankAccounts stream in the full refresh mode
👍
{"id": "il_2", "invoice_id": "in_1KD6OVIEn5WyEQxn9xuASHsD", "object": "line_item"}, | ||
{"id": "il_3", "invoice_id": "in_1KD6OVIEn5WyEQxn9xuASHsD", "object": "line_item"}, | ||
] | ||
bank_accounts_incremental_test_case = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests Data not filtered when using the event-based incremental sync mode
👍
…rbyte into ddavydov/3398-oncall-bugfix
@maxi297 Completely forgot about it, thanks. Tests updated again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test!
What
This PR addresses multiple issues described in https://github.com/airbytehq/oncall/issues/3398 and #31555 that are related to lazy substreams, mostly the
BankAccounts
stream:<lambda>() missing 1 required keyword-only argument: 'stream_slice'
BankAccounts
stream in the full refresh modeThey are all fixed in a single PR because they all depend one on another.
How
path
andextra_request_params
Customers
stream twice - first to be an independent stream, second to be a parent for theBankAccounts
stream and expand the requested dataFilteringRecordExtractor
with aresponse_filter
callable that is now passed into bothIncrementalExtractor
andDefaultExtractor
so that records are filtered no matter what the sync mode is.UpdatedCursorIncrementalStripeLazySubStream
so that cursor value is filled in the full refresh sync mode as well