Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dagster: Add a schedule and auto materialization to handle resource limits #28190

Merged
merged 8 commits into from
Jul 26, 2023

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Jul 12, 2023

Loom: https://www.loom.com/share/22025644eea24b6d901c635b71c0c68b

What

  1. Updates the all_metadata sensor to a schedule + job that adds all partitioned keys to the dynamic partition
  2. Adds auto materialization to registry entry to have dagster manage concurrency
  3. Updates metadata to return None when not parseable to avoid missing metadata files showing as un materialized
  4. Updates spec cache to have a freshness policy

@bnchrch bnchrch requested review from erohmensing, alafanechere and a team July 26, 2023 20:31
@bnchrch bnchrch marked this pull request as ready for review July 26, 2023 20:31
Comment on lines -158 to -164
new_gcs_blobs_partition_sensor(
job=generate_registry_entry,
resources_def=METADATA_RESOURCE_TREE,
partitions_def=registry_entry.metadata_partitions_def,
gcs_blobs_resource_key="latest_metadata_file_blobs",
interval=60,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description it looks like we want to only move the all_metadata_file_blobs sensor for generate_registry_entry to a schedule and keep the latest_metadata_file_blobs on a 30 second (60?) sensor. Here it looks like we are replacing both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were removing the latest metadata sensor all together for now.

Seeing if we can go purely on all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - looks good from the loom. Only implication is if things go badly we need to revert, vs it not being a big deal to fix forward for only the all_metadata_files. Seems fine as long as we are prepped for that!

group_name=GROUP_NAME,
partitions_def=metadata_partitions_def,
output_required=False,
auto_materialize_policy=AutoMaterializePolicy.eager(max_materializations_per_minute=50),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! how did we land on the number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrarily! This may need to be adjusted when it hits cloud

Comment on lines 47 to 49
@job
def add_new_metadata_partitions():
add_new_metadata_partitions_op()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be added into the list of jobs in the init file, or does it work differently than other jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didnt need to!

You can see it in this loom video: https://www.loom.com/share/22025644eea24b6d901c635b71c0c68b

But Ill go and add it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... weird! i'd be interested to know if our list has any meaning then :D

@op(required_resource_keys={"all_metadata_file_blobs"})
def add_new_metadata_partitions_op(context):
"""
This op is responsible for polling for new metadata files and adding their etag to the dynamic partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨🏻‍🍳 💋

@bnchrch bnchrch requested a review from erohmensing July 26, 2023 22:16
Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ as long as we monitor it when we ship it!

@bnchrch bnchrch merged commit 5c9f0fa into master Jul 26, 2023
@bnchrch bnchrch deleted the bnchrch/materialize-all-etags-job branch July 26, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants