Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 life-cycle policies and deletion prevention #4215

Open
sandstrom opened this issue Aug 25, 2021 · 12 comments
Open

Support S3 life-cycle policies and deletion prevention #4215

sandstrom opened this issue Aug 25, 2021 · 12 comments

Comments

@sandstrom
Copy link

sandstrom commented Aug 25, 2021

Problem Description

Making sure logs can't deleted (by e.g. a hacker) is a common desire. One way of doing this is to use S3 life-cycle policies and object lock to prevent early deletion.

Basically, you can configure something like "delete all files 6 months after they were created, and don't allow deletion of any files until 3 months after their creation at the earliest.". This assumes that all files are 'write-only'.

Using this to handle automatic deletion with loki has some benefits over using the log deletion APIs:

  • No need to configure loki deletion logic.
  • Loki agent itself don't have permission to delete, so if it's compromised the logs will still be around.

Proposed Solutions

ALT1: I know Loki is mostly an append-only service, i.e. files are generally not modified, only written once. If I've understood it correctly, there are the actual log files under the fake/ directory, and those are append-only. Then there are index files under index/loki_index_18723/ (incremented number), and those seems to be append-only too, correct?

So this may be possible today, without further modifications to Loki. In that case all that's needed is documentation, basically mentioning that this is one way of doing log deletion. I hope this may be the case!

ALT2: If Loki have some files that are updated, they'd have to be either unimportant (so it's fine to have them deleted), or rearchitected into something that's append only.

Describe alternatives you've considered

Don't support log deletion prevention.

Some earlier discussion #577 (comment)

@owen-d
Copy link
Member

owen-d commented Aug 31, 2021

Hey @sandstrom, thanks for the detailed writeup. In general, Loki avoids all but the most simplistic forms of auth{entication,orization}. We only use a header X-Scope-OrgID to determine which "tenant" a request is scoped to. This is a very intentional choice; we haven't wanted to open ourselves to additional complexity, at least not yet. That being said, I can see it being helpful to expose a per tenant config to enable/disable this. Currently these APIs are gated globally by -boltdb.shipper.compactor.retention-enabled.

P.S.
Slightly related, have you seen the compactor based retention docs?

@sandstrom
Copy link
Author

sandstrom commented Aug 31, 2021

@owen-d Thanks for taking time looking at this!

I've read those docs, but what I'm looking for here is basically an explanation of:

  1. what files Loki will write to S3,
  2. whether those files are append-only or ever updated and
  3. if any of them aren't "append only", why that's the case (and if you may, in the future, make them append only).

Answer to (1) and (2) could look something like this:

We write data files to {org-id}/{random:random:random}, for example
fake/60d9ca4ad01b7205:17a62064833:17a62339115:c754fd1d and those files are append only.
We also write files to index/loki_index_{increasing-integer}/{host-name}-{units-time}-{random}.gz for example index/loki_index_18723/hopeful-titmouse-1617717627420113141-1617793544.gz, and those files are also append only.

(I'm not sure this example answer is actually accurate, so feel free to correct me)

@stale
Copy link

stale bot commented Oct 2, 2021

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Oct 2, 2021
@sandstrom
Copy link
Author

ping

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Oct 2, 2021
@stale
Copy link

stale bot commented Mar 3, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 3, 2022
@sandstrom
Copy link
Author

ping

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 3, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 17, 2022
@sandstrom
Copy link
Author

Still relevant

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Apr 17, 2022
@elliotdobson
Copy link
Contributor

elliotdobson commented Jun 1, 2022

I have a similar use case but am wanting to take it a step further by replicating all objects in the S3 bucket to another S3 bucket in a different region/account. Then on the replicated bucket applying a S3 lifecycle policy to apply retention to the files in the bucket.

From what I've seen in practice, Loki for the most part does not need permission to delete objects from S3.

The exception to this is the compactor component of Loki which compacts the indexes. This is mentioned in the storage docs here and here. Though I have not tested running Loki without DeleteObject permission.

@darox
Copy link

darox commented Jun 2, 2022

I'm also interested in this use case. I might make some tests with MinIO.

Update: I made some tests and as expected it's not working:

2022-06-13 08:23:29 | {"log":"level=error ts=2022-06-13T06:23:28.983754136Z caller=compactor.go:370 msg=\"failed to run compaction\" err=\"InvalidRequest: Object is WORM protected and cannot be overwritten\ \	status code: 400, request id: 16F819F65D33C2DB, host id: \" ","stream":"stderr","time":"2022-06-13T06:23:28.983945553Z"}
-- | --

@elliotdobson
Copy link
Contributor

After running Loki for a while it seems that...

  • The chunk objects (actual log lines) are WORM compatible; this is the objects stored under the tenant directory (fake/ by default)
  • The index objects are rewritten by the compactor which basically merges all the index objects into a single object; this is the objects stored under index/

So it doesn't look like you can enable WORM on the S3 bucket if you run the compactor component.

I'm running Loki in simple-scalable-deployment mode + a separate compactor. The read/write statefulSets only have GetObject & PutObject S3 permissions. The compactor statefulSet has GetObject, PutObject and DeleteObject. So I guess that helps to reduce the compromise risk some what?

I've been testing S3 bucket replication for DR purposes. So far I am able to read the replicated objects from the destination bucket with a test Loki instance.

I haven't yet tested S3 lifecycle policy to apply retention to the files in the bucket, but will do in the coming weeks.

It would be interesting to hear how disaster recovery (DR) is handled in Grafana Cloud.

@sandstrom
Copy link
Author

@elliotdobson Those are some good points!

A few additions:

  • On S3, for durability (protect against ransomeware/hackers), there is no difference between PutObject and DeleteObject (overwriting a file with null bytes is the same as deleting).

  • The problem with replication is that it will also replicate against any malicious act in the original bucket. So it helps against accidental data loss, but won't protect against ransomware or hackers.

  • There is bucket versioning that can be used. But I find the ObjectLock mechanism to be a cleaner approach overall. However, it does assume immutable data (WORM / write-once), which seems to be only partially supported at the moment.


One thing that might help would be if the compactor could leave old index files in place somehow, such that they could also be used, albeit with less efficiency. That way, one could lock them down and delete after Y days via life-cycle rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants