-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(registry): add remove latest partition job #44795
feat(registry): add remove latest partition job #44795
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1,6 +1,6 @@ | |||
[tool.poetry] | |||
name = "orchestrator" | |||
version = "0.4.0" | |||
version = "0.4.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version = "0.4.1" | |
version = "0.5.0" |
Since we added a new job
@op(required_resource_keys={"latest_metadata_file_blobs"}) | ||
def remove_latest_metadata_partitions_op(context): | ||
""" | ||
This op is responsible for removing for latest metadata files. (Generally used to reprocess metadata files). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This op is responsible for removing for latest metadata files. (Generally used to reprocess metadata files). | |
This op is responsible for removing for latest metadata partitions. (Generally used to reprocess metadata files). |
The difference between these 2 is what made it so hard for me to comprehend for so long, so keeping it correct :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three questions:
1. Why do yo want to reprocess latest?
My understanding of the system is that partitions are per etag, so if latest changed new metadata partition will get added. If latest did not change I don't get why we'd want to recreate the partition and retrrigger all downstream operations.
2. What is scheduling the job?
I'm not seeing any code change related to scheduling this remove_latest_metadata_partitions
. There's probably an implicit logic I'm unaware of.
3. Would this be useful for my release candidate PR
I want to rematerialize an asset on blob deletion.
The asset is a list of available release candidates.
I took an approach which I'm not super satisfied of here.
Your change makes me think I could take the following approach:
- Consider
release_candidate
metadata as partitions - Make the job which is remove stale metadata partition trigger asset rematerialization on partition deletion with
AssetMaterialization(asset_key=AssetKey("my_asset"), description="Rematerializing my asset")
Does it sound better?
@alafanechere Great questions. @erohmensing and I added this so in the event we updated the registry entry generation logic, we could manually retrigger registry entry processing. How we do that is simply by deleting the partition keys for all latest metadata files as it will be picked up on the next "add partition run" and processed again. |
To add - for example if we add a new (generated) field in the metadata, we want to be able to populate that info for the connectors without having to re-release them |
Sounds definitely legit. @erohmensing @bnchrch will you manually run the job to delete the partitions? |
@alafanechere Yep, the plan is to do that in the evening in SF so as not to create a backlog that slows connectors being published during the day! |
@alafanechere As the new DAG master Ill merge when we have your approval :) |
I'm on GitHub mobile so I don't think I can 👍 from there. But I do approve! Feel free to merge |
What
Provides a job to delete all latest partitions.
Why?
If you remove the partitions they will be readded triggering a reprocess of latest
Review guide
User Impact
Can this PR be safely reverted and rolled back?