-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Shopify: fixed store
redirection, add bulk checkpointing
#42095
Conversation
…fix-base-url-mismatch
…fix-base-url-mismatch
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
The new This release contains:
|
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
…fix-base-url-mismatch
…fix-base-url-mismatch
…fix-base-url-mismatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look good, nice work on all these OC issues. One recommendation in the future would be to break these up into individual PRs. It was kind of hard to sort through things to understand what was being fixed for each OC issues especially since I'm not that familiar with the shopify code. But not a problem this time around.
I did add a few comments that we can discuss and also the http_client.name
property should be in CDK 3.9.3
if you want to uptake that so we don't have to reference a private _name
attribute.
And do we have any regression tests run against this new version of shopify?
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-shopify/source_shopify/shopify_graphql/bulk/job.py
Outdated
Show resolved
Hide resolved
Yes, @aldogonzalez8 helped with this, they are identical. No changes or regressions are expected. |
Acceptance testing is passing in combination with my PR branch. Testing CAT steps to run in my local:
Now, I will run a Regression locally |
Regression tests results
product_variants example record for control version
product_variants example record for target version
|
Could this one be an issue? @bazarnov |
…fix-base-url-mismatch
@aldogonzalez8 It was a question of the new config property:
I've checked the data locally, and the
We can repeat the tests, if needed. |
Test repeated from scratch, everything is 1:1
|
/approve-regression-tests
|
It will be released on Monday, July 29th. |
…fix-base-url-mismatch
…fix-base-url-mismatch
/approve-regression-tests
|
"default": 3600, | ||
"minimum": 1 | ||
"default": 7200, | ||
"minimum": 3600, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, is there a specific reason the minimum needs to be 3600 or greater than 1? Adding this requirement means large jobs significantly increase the time to which they arrive at a small enough slice that can run without failure.
EDIT: My guess is that < 3600 is thought to be obviated with the introduction of job_checkpoint_interval
, though given the variation across shop windows in rows collected per second, if a job of 15000 (min) rows always fails with a new flavor of retryable error after 3600 seconds and would otherwise succeed in some smaller threshold, we have added a failure requirement here without a workaround until a fix is in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jblakeman Thank you for raising this; we selected the values from what we got on our side and are happy to adjust them now that we have your feedback. I'll add this to the next source-shopify update.
What
Resolving:
How
Exception
-->BulkJobRedirectToOtherShopError,
which occurs when the store is operated from the manager account and the responses are redirected to the originalshop
instead of thealiased
onesession
andstream_name
input arguments forjob_manager
for BULK streams, in favor of thehttp_client
setupBULK
streams:config
field -->job_checkpoint_interval
with the default value of200,000
(200k lines) to be collected, before the single Bulk Job is checkpointed.threshold
lines collected for the single BULK Job, it'sCanceled
and the intermediate results are processed, instead of complete cancelation and retry using smallerjob_size
, which is also there, but will be used as asafety net
forlong-running
jobs. The primary method to overcomedata-density
issues for heavy-data-loaded streams ischeckpointing
.ASC
for allBULK
streams, to guarantee the correct order of records forCHECKPOINTING
FAILED
scenario, when the BULK Job hasFAILED
status and we've collected some results already, we want them to be caught, instead of raising the exception and retry, thus the source should continue from the checkpoint, for streams that support it.unit tests
to cover the updatesThe checkpointing for the Shopify Bulk Operation looks like this:
job_should_checkpoint
(once the server has already collected some results, based on theobjectCount
field, within theresponse
for theRUNNING
job)CANCELED
(intentionally, expected)URL
provided for the already collected results, and the fact the Job is canceledstart
from thecheckpointed
cursor value, theend
is also adjusted according to thebulk_window_in_days
step.Streams that don't support BULK CHECKPOINTING currently, typically, because the API doesn't support
sortKey: UPDATED_AT
:For such streams, the
Cancel > Reduce Slice Size > Retry
strategy is applied.User Impact
No impact is expected, except that the stuck syncs that were not able to make any progress due to
Canceled / Retried
Bulk Jobs should be fetched normally, and the progress should take place now.Can this PR be safely reverted and rolled back?