Support S3 life-cycle policies and deletion prevention #4215

sandstrom · 2021-08-25T09:14:23Z

Problem Description

Making sure logs can't deleted (by e.g. a hacker) is a common desire. One way of doing this is to use S3 life-cycle policies and object lock to prevent early deletion.

Basically, you can configure something like "delete all files 6 months after they were created, and don't allow deletion of any files until 3 months after their creation at the earliest.". This assumes that all files are 'write-only'.

Using this to handle automatic deletion with loki has some benefits over using the log deletion APIs:

No need to configure loki deletion logic.
Loki agent itself don't have permission to delete, so if it's compromised the logs will still be around.

Proposed Solutions

ALT1: I know Loki is mostly an append-only service, i.e. files are generally not modified, only written once. If I've understood it correctly, there are the actual log files under the fake/ directory, and those are append-only. Then there are index files under index/loki_index_18723/ (incremented number), and those seems to be append-only too, correct?

So this may be possible today, without further modifications to Loki. In that case all that's needed is documentation, basically mentioning that this is one way of doing log deletion. I hope this may be the case!

ALT2: If Loki have some files that are updated, they'd have to be either unimportant (so it's fine to have them deleted), or rearchitected into something that's append only.

Describe alternatives you've considered

Don't support log deletion prevention.

Some earlier discussion #577 (comment)

The text was updated successfully, but these errors were encountered:

owen-d · 2021-08-31T14:16:57Z

Hey @sandstrom, thanks for the detailed writeup. In general, Loki avoids all but the most simplistic forms of auth{entication,orization}. We only use a header X-Scope-OrgID to determine which "tenant" a request is scoped to. This is a very intentional choice; we haven't wanted to open ourselves to additional complexity, at least not yet. That being said, I can see it being helpful to expose a per tenant config to enable/disable this. Currently these APIs are gated globally by -boltdb.shipper.compactor.retention-enabled.

P.S.
Slightly related, have you seen the compactor based retention docs?

sandstrom · 2021-08-31T15:43:03Z

@owen-d Thanks for taking time looking at this!

I've read those docs, but what I'm looking for here is basically an explanation of:

what files Loki will write to S3,
whether those files are append-only or ever updated and
if any of them aren't "append only", why that's the case (and if you may, in the future, make them append only).

Answer to (1) and (2) could look something like this:

We write data files to {org-id}/{random:random:random}, for example
fake/60d9ca4ad01b7205:17a62064833:17a62339115:c754fd1d and those files are append only.
We also write files to index/loki_index_{increasing-integer}/{host-name}-{units-time}-{random}.gz for example index/loki_index_18723/hopeful-titmouse-1617717627420113141-1617793544.gz, and those files are also append only.

(I'm not sure this example answer is actually accurate, so feel free to correct me)

stale · 2021-10-02T00:57:10Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

sandstrom · 2021-10-02T06:40:22Z

ping

stale · 2022-03-03T03:49:21Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

sandstrom · 2022-03-03T09:23:34Z

ping

stale · 2022-04-17T05:15:02Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

sandstrom · 2022-04-17T18:19:26Z

Still relevant

elliotdobson · 2022-06-01T04:20:33Z

I have a similar use case but am wanting to take it a step further by replicating all objects in the S3 bucket to another S3 bucket in a different region/account. Then on the replicated bucket applying a S3 lifecycle policy to apply retention to the files in the bucket.

From what I've seen in practice, Loki for the most part does not need permission to delete objects from S3.

The exception to this is the compactor component of Loki which compacts the indexes. This is mentioned in the storage docs here and here. Though I have not tested running Loki without DeleteObject permission.

darox · 2022-06-02T07:25:26Z

I'm also interested in this use case. I might make some tests with MinIO.

Update: I made some tests and as expected it's not working:

2022-06-13 08:23:29 | {"log":"level=error ts=2022-06-13T06:23:28.983754136Z caller=compactor.go:370 msg=\"failed to run compaction\" err=\"InvalidRequest: Object is WORM protected and cannot be overwritten\ \	status code: 400, request id: 16F819F65D33C2DB, host id: \" ","stream":"stderr","time":"2022-06-13T06:23:28.983945553Z"}
-- | --

elliotdobson · 2022-10-20T03:44:48Z

After running Loki for a while it seems that...

The chunk objects (actual log lines) are WORM compatible; this is the objects stored under the tenant directory (fake/ by default)
The index objects are rewritten by the compactor which basically merges all the index objects into a single object; this is the objects stored under index/

So it doesn't look like you can enable WORM on the S3 bucket if you run the compactor component.

I'm running Loki in simple-scalable-deployment mode + a separate compactor. The read/write statefulSets only have GetObject & PutObject S3 permissions. The compactor statefulSet has GetObject, PutObject and DeleteObject. So I guess that helps to reduce the compromise risk some what?

I've been testing S3 bucket replication for DR purposes. So far I am able to read the replicated objects from the destination bucket with a test Loki instance.

I haven't yet tested S3 lifecycle policy to apply retention to the files in the bucket, but will do in the coming weeks.

It would be interesting to hear how disaster recovery (DR) is handled in Grafana Cloud.

sandstrom · 2022-10-20T08:39:18Z

@elliotdobson Those are some good points!

A few additions:

On S3, for durability (protect against ransomeware/hackers), there is no difference between PutObject and DeleteObject (overwriting a file with null bytes is the same as deleting).
The problem with replication is that it will also replicate against any malicious act in the original bucket. So it helps against accidental data loss, but won't protect against ransomware or hackers.
There is bucket versioning that can be used. But I find the ObjectLock mechanism to be a cleaner approach overall. However, it does assume immutable data (WORM / write-once), which seems to be only partially supported at the moment.

One thing that might help would be if the compactor could leave old index files in place somehow, such that they could also be used, albeit with less efficiency. That way, one could lock them down and delete after Y days via life-cycle rules.

stale bot added the stale A stale issue or PR that will automatically be closed. label Oct 2, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Oct 2, 2021

stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 3, 2022

stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 3, 2022

stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 17, 2022

stale bot removed the stale A stale issue or PR that will automatically be closed. label Apr 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support S3 life-cycle policies and deletion prevention #4215

Support S3 life-cycle policies and deletion prevention #4215

sandstrom commented Aug 25, 2021 •

edited

Loading

owen-d commented Aug 31, 2021

sandstrom commented Aug 31, 2021 •

edited

Loading

stale bot commented Oct 2, 2021

sandstrom commented Oct 2, 2021

stale bot commented Mar 3, 2022

sandstrom commented Mar 3, 2022

stale bot commented Apr 17, 2022

sandstrom commented Apr 17, 2022

elliotdobson commented Jun 1, 2022 •

edited

Loading

darox commented Jun 2, 2022 •

edited

Loading

elliotdobson commented Oct 20, 2022

sandstrom commented Oct 20, 2022

Support S3 life-cycle policies and deletion prevention #4215

Support S3 life-cycle policies and deletion prevention #4215

Comments

sandstrom commented Aug 25, 2021 • edited Loading

Problem Description

Proposed Solutions

Describe alternatives you've considered

owen-d commented Aug 31, 2021

sandstrom commented Aug 31, 2021 • edited Loading

stale bot commented Oct 2, 2021

sandstrom commented Oct 2, 2021

stale bot commented Mar 3, 2022

sandstrom commented Mar 3, 2022

stale bot commented Apr 17, 2022

sandstrom commented Apr 17, 2022

elliotdobson commented Jun 1, 2022 • edited Loading

darox commented Jun 2, 2022 • edited Loading

elliotdobson commented Oct 20, 2022

sandstrom commented Oct 20, 2022

sandstrom commented Aug 25, 2021 •

edited

Loading

sandstrom commented Aug 31, 2021 •

edited

Loading

elliotdobson commented Jun 1, 2022 •

edited

Loading

darox commented Jun 2, 2022 •

edited

Loading