Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a cleanup job based on specified retention days #268

Closed
aldy505 opened this issue Jun 3, 2023 · 6 comments
Closed

Create a cleanup job based on specified retention days #268

aldy505 opened this issue Jun 3, 2023 · 6 comments

Comments

@aldy505
Copy link
Contributor

aldy505 commented Jun 3, 2023

Regarding getsentry/self-hosted#2154 (comment):

BYK It's definitely an issue we didn't really think about since we were relying on GCS retention settings to remove profiles when needed. I should be addressed separately in vroom.

On sentry and snuba image, they both respect SENTRY_EVENT_RETENTION_DAYS environment variable in which will automatically delete the data longer than the provided retention days. While the self-hosted version of vroom rely on filesystem, should we create a simple goroutine (instead of having it as a separate process -- which will add a new container on the self-hosted docker-compose) that will do a cleanup job everyday?

@aldy505
Copy link
Contributor Author

aldy505 commented Jun 7, 2023

Any thoughts? @phacops

I can help on implementing it, might expect it to be done before the next self-hosted release.

@phacops
Copy link
Contributor

phacops commented Jun 13, 2023

Depending on where you host, this is probably handled better differently. I'd say we can start by making a job handling the filesystem case. If you're on a cloud provider, you probably should have it using retention parameters of S3/ABS/GCS.

If we focus on the filesystem case, we could likely start with implementing this in self-hosted directly by launching a container from this image (https://github.com/getsentry/self-hosted/blob/master/cron/Dockerfile) and then, every 6h, run this command:

find <vroom volume> -mtime +<retention days> -delete

If we need something more generic (which, I'm against since it's better handled differently for each cloud providers), we'll make a new command in this repository, listing files needing to be deleted from ClickHouse and we'll use the library we have to delete them.

@aldy505
Copy link
Contributor Author

aldy505 commented Jun 13, 2023

Oh right, I was talking about filesystem. Sorry about that.

I'm okay with having a cron container, but considering we already got like around 38-39 containers in self-hosted currently, is it okay to add another one? I know that it should be a lightweight container, but the idea of adding another one might be opposed by someone else.

@phacops
Copy link
Contributor

phacops commented Jun 15, 2023

Indeed, we have a lot of containers. We could reuse an existing container running a cron and just adding the cron definition in it (I'm thinking about the symbolicator cleanup job).

@aldy505
Copy link
Contributor Author

aldy505 commented Jun 17, 2023

I'd prefer to have a dedicated container for vroom. I don't think it makes sense if we're adding another definition on existing cron container.

I'll make a PR soon on self-hosted to address this.

@aldy505
Copy link
Contributor Author

aldy505 commented Jul 21, 2023

Handled on self-hosted. getsentry/self-hosted#2211

@aldy505 aldy505 closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants