-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Allow the janitor to be run on-demand #2844
Conversation
523f29f
to
1d1e652
Compare
sqlmesh/core/console.py
Outdated
def start_cleanup(self, ignore_ttl: bool) -> bool: | ||
if ignore_ttl: | ||
self._print( | ||
"Are you sure you want to delete all orphaned snapshots regardless of their environment?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this message. Doesn't orphaned
mean they no longer belong to any environments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the wording to make it clearer about what will happen. Is it better?
sqlmesh/core/console.py
Outdated
self._print( | ||
"Are you sure you want to delete all orphaned snapshots regardless of their environment?" | ||
) | ||
if not self._confirm("This may affect other users. Proceed?"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very API friendly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, and ironically it also makes it hard to test the Notebook magic because it requires user input.
However, I dont think this is something we should encourage the users to be running automatically anyway. It's more of an escape hatch to speed something up outside of the normal order of operations.
I could add a --no-prompts
flag like some of the other operations, would that be better?
.from_(self.snapshots_table) | ||
.where(exp.column("expiration_ts") <= current_ts) | ||
) | ||
expired_query = exp.select("name", "identifier", "version").from_(self.snapshots_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a significant runtime penalty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if the user specifies --ignore-ttl
, which means that they do want every snapshot checked. I think if theyre going down this road then they can accept the penalty
1d1e652
to
1ab73b9
Compare
This exposes the janitor task to the CLI so that users may run it on-demand and not as part of
sqlmesh run
.To trigger the janitor, just run
sqlmesh janitor
and it calls the janitor task in the exact same way thatsqlmesh run
would have.The main benefit is being able to decouple the janitor process and the missing intervals process so they dont interfere with each other.
I've noticed that they can interfere with each other when there are many large tables to drop, particularly on systems like Trino / Iceberg where a
DROP TABLE
statement blocks until all the files have been removed from S3. This causes the main run to be blocked until the janitor is complete, which is undesirable when you want it to refresh data ASAP.I also added an option
--ignore-ttl
to ignore the snapshot TTL when identifying snapshots to remove. Iterating heavily on a SQLMesh project leaves lots of unused tables lying around. If you want to clean them up now - you're stuck. You have no way to do it without either waiting for the TTL to expire or manually modifying the state database to adjust the TTL to be in the past.This option gives you a way to do it. I understand there is some concern around race conditions, so it throws up a warning and requires the user to manually indicate they want to proceed before continuing. The idea isn't to use it all the time, just when it makes sense to perform an early cleanup.