Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[controller] Fix hybrid store migration failure during ongoing batch push jobs #1360

Merged

Conversation

sushantmane
Copy link
Contributor

@sushantmane sushantmane commented Dec 3, 2024

Fix hybrid store migration failure during ongoing batch push jobs

This PR fixes an issue where hybrid store migration fails if a batch push job is ongoing
and a non-truncated version topic exists in the parent region.

The problem stemmed from a safeguard intended to prevent hybrid-to-batch conversion during
ongoing push jobs (added to address targeted region push issues). The safeguard failed to
account for store migration scenarios where hybrid configurations are added during the
updateStore operation.

This misclassification of the migration attempt as a batch-to-hybrid store conversion
caused unintended operations, including a call to VPHA::getTopicForCurrentPushJob. This
method attempts to issue a killOfflinePushJob when the version config is missing in the
store’s configuration. The kill job, in turn, tries to update the version status to
KILLED, which does not exist, which leads to a version not found exception from
AbstractStore::updateVersionStatus.

Ref: #1143

How was this PR tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

@sushantmane sushantmane changed the title [controller] Fix safeguard handling for hybrid store migration during ongoing batch push jobs [controller] Fix hybrid store migration failure during ongoing batch push jobs Dec 3, 2024
… ongoing batch push

jobs

This PR fixes an issue where hybrid store migration fails if a batch push job is ongoing
and a non-truncated version topic exists in the parent region.

The problem stemmed from a safeguard intended to prevent hybrid-to-batch conversion during
ongoing push jobs (added to address targeted region push issues). The safeguard failed to
account for store migration scenarios where hybrid configurations are added during the
updateStore operation.

This misclassification of the migration attempt as a batch-to-hybrid store conversion
caused unintended operations, including a call to VPHA::getTopicForCurrentPushJob. This
method attempts to issue a killOfflinePushJob when the version config is missing in the
store’s configuration. The kill job, in turn, tries to update the version status to
KILLED, which does not exist, which leads to a version not found exception from
AbstractStore::updateVersionStatus.
@sushantmane sushantmane force-pushed the li-fix-store-migration-issues branch from f72497b to 280de65 Compare December 3, 2024 21:31
Copy link
Contributor

@majisourav99 majisourav99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@sushantmane
Copy link
Contributor Author

Thank you, @smaji for the quick review!

@sushantmane sushantmane enabled auto-merge (squash) December 3, 2024 21:42
@sushantmane sushantmane merged commit f653cdd into linkedin:main Dec 3, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants