-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dagster: Add a schedule and auto materialization to handle resource limits #28190
Conversation
new_gcs_blobs_partition_sensor( | ||
job=generate_registry_entry, | ||
resources_def=METADATA_RESOURCE_TREE, | ||
partitions_def=registry_entry.metadata_partitions_def, | ||
gcs_blobs_resource_key="latest_metadata_file_blobs", | ||
interval=60, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description it looks like we want to only move the all_metadata_file_blobs
sensor for generate_registry_entry
to a schedule and keep the latest_metadata_file_blobs
on a 30 second (60?) sensor. Here it looks like we are replacing both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were removing the latest metadata sensor all together for now.
Seeing if we can go purely on all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - looks good from the loom. Only implication is if things go badly we need to revert, vs it not being a big deal to fix forward for only the all_metadata_files. Seems fine as long as we are prepped for that!
group_name=GROUP_NAME, | ||
partitions_def=metadata_partitions_def, | ||
output_required=False, | ||
auto_materialize_policy=AutoMaterializePolicy.eager(max_materializations_per_minute=50), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! how did we land on the number here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrarily! This may need to be adjusted when it hits cloud
@job | ||
def add_new_metadata_partitions(): | ||
add_new_metadata_partitions_op() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be added into the list of jobs in the init file, or does it work differently than other jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didnt need to!
You can see it in this loom video: https://www.loom.com/share/22025644eea24b6d901c635b71c0c68b
But Ill go and add it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... weird! i'd be interested to know if our list has any meaning then :D
@op(required_resource_keys={"all_metadata_file_blobs"}) | ||
def add_new_metadata_partitions_op(context): | ||
""" | ||
This op is responsible for polling for new metadata files and adding their etag to the dynamic partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🏻🍳 💋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ as long as we monitor it when we ship it!
Loom: https://www.loom.com/share/22025644eea24b6d901c635b71c0c68b
What