Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Allow the janitor to be run on-demand #2844

Merged
merged 2 commits into from
Jul 2, 2024
Merged

Conversation

erindru
Copy link
Collaborator

@erindru erindru commented Jul 1, 2024

This exposes the janitor task to the CLI so that users may run it on-demand and not as part of sqlmesh run.

To trigger the janitor, just run sqlmesh janitor and it calls the janitor task in the exact same way that sqlmesh run would have.

The main benefit is being able to decouple the janitor process and the missing intervals process so they dont interfere with each other.

I've noticed that they can interfere with each other when there are many large tables to drop, particularly on systems like Trino / Iceberg where a DROP TABLE statement blocks until all the files have been removed from S3. This causes the main run to be blocked until the janitor is complete, which is undesirable when you want it to refresh data ASAP.

I also added an option --ignore-ttl to ignore the snapshot TTL when identifying snapshots to remove. Iterating heavily on a SQLMesh project leaves lots of unused tables lying around. If you want to clean them up now - you're stuck. You have no way to do it without either waiting for the TTL to expire or manually modifying the state database to adjust the TTL to be in the past.

This option gives you a way to do it. I understand there is some concern around race conditions, so it throws up a warning and requires the user to manually indicate they want to proceed before continuing. The idea isn't to use it all the time, just when it makes sense to perform an early cleanup.

docs/reference/cli.md Outdated Show resolved Hide resolved
def start_cleanup(self, ignore_ttl: bool) -> bool:
if ignore_ttl:
self._print(
"Are you sure you want to delete all orphaned snapshots regardless of their environment?"
Copy link
Member

@izeigerman izeigerman Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this message. Doesn't orphaned mean they no longer belong to any environments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the wording to make it clearer about what will happen. Is it better?

self._print(
"Are you sure you want to delete all orphaned snapshots regardless of their environment?"
)
if not self._confirm("This may affect other users. Proceed?"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very API friendly

Copy link
Collaborator Author

@erindru erindru Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, and ironically it also makes it hard to test the Notebook magic because it requires user input.

However, I dont think this is something we should encourage the users to be running automatically anyway. It's more of an escape hatch to speed something up outside of the normal order of operations.

I could add a --no-prompts flag like some of the other operations, would that be better?

.from_(self.snapshots_table)
.where(exp.column("expiration_ts") <= current_ts)
)
expired_query = exp.select("name", "identifier", "version").from_(self.snapshots_table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a significant runtime penalty

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the user specifies --ignore-ttl, which means that they do want every snapshot checked. I think if theyre going down this road then they can accept the penalty

@erindru erindru merged commit 94f3ec6 into main Jul 2, 2024
15 checks passed
@erindru erindru deleted the erin/ondemand-janitor branch July 2, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants