-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact does not delete expired blocks in s3 #3436
Comments
|
Cleaning up used to happen only at the end of an iteration. Has it ever finished one? The current Thanos version does that concurrently in addition to one time cleanup on boot. Only retention policies are applied at the end. |
the log shows nothing to do for a long time
does this means a lot of iterations? the log does shows |
I run |
I have the same problem. The data stored in the object will not be deleted |
All, what is the metric |
I have only one instance, and it has been running for half a month |
I notice 2 things. might related to this issue.
|
found some errors in the log
this one seems stop the group process
after this error, the log start repeat |
same problem |
Have you tried the latest version(0.17.0)? I see in the changelog that #3115 was part of that release and it should decouple blocks deletion from compaction iterations. |
the problem is not to clean the expired blocks. it's seems the retention policy is not work. so the block is not mark for deletion |
What implementation of S3 are you using? Sounds like it doesn't implement the API properly |
it's an object storage service from my company and it implements S3 api. but I don't think its an error from implementation. most of the request are successful. the compactor can compact,clean, downsample blocks. it just the retention policy not work, no |
Seeing the same issues with GCP. |
after I delete some blocks manually, the compactor now works as expected. I think there are some blocks with some problems that stop the compactor from working. it stuck at the grouping process |
I extract the code that I can apply the retention manually, I think it might help people with the same problem. package main
import (
"context"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/pkg/relabel"
"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/compact"
"github.com/thanos-io/thanos/pkg/logging"
"github.com/thanos-io/thanos/pkg/objstore/client"
"time"
)
func main() {
filter := block.NewLabelShardedMetaFilter([]*relabel.Config{
{
SourceLabels: []model.LabelName{"thanos"},
Action: relabel.Keep,
Regex: relabel.MustNewRegexp(".*"),
},
})
filters := []block.MetadataFilter{filter}
logger := logging.NewLogger("debug", "logfmt", "test")
ctx := context.TODO()
retentionByResolution := map[compact.ResolutionLevel]time.Duration{
compact.ResolutionLevelRaw: 30 * 24 * time.Hour,
compact.ResolutionLevel5m: 30 * 24 * time.Hour,
compact.ResolutionLevel1h: 30 * 24 * time.Hour,
}
confContentYaml := `
type: S3
config:
bucket: "xxxxxx"
endpoint: "xxxxx"
access_key: "xxxxx"
secret_key: "xxxxxx"
signature_version2: true
insecure: false
`
bkt, err := client.NewBucket(logger, []byte(confContentYaml), nil, "aaa")
if err != nil {
logger.Log("msg", err)
return
}
metaFetcher, err := block.NewMetaFetcher(logger, 32, bkt, "", nil, filters, nil)
if err != nil {
logger.Log("msg", err)
return
}
blocksMarkedForDeletion := promauto.With(nil).NewCounter(prometheus.CounterOpts{})
metas, _, err := metaFetcher.Fetch(ctx)
if err != nil {
logger.Log("msg", err)
return
}
err = compact.ApplyRetentionPolicyByResolution(ctx, logger, bkt, metas, retentionByResolution, blocksMarkedForDeletion)
if err != nil {
logger.Log("msg", err)
return
}
} |
the problems shows again, I have a strong feelings it's casued by some blocks with same time range. because this section of blocks are not grouped when other blocks are still grouping, this block may cause compactor stuck, I have to add a crontab to restart it every 6 hour. The extra block was created in a migration of prometheus instance, it create 2 blocks with same time range. one has 2 hours duration the other has less than one hour. |
I delete the block with same time range, and it works again. I think it's either because the block are not two hours of complete data, or because the two blocks overlap in time with same compact level and same duration |
@diemus Do you think it is worth adding that code piece as a tool until we find the cause and fix the issue? |
@kakkoyun I think it might help people who does not know how to run that code piece. as it for now, users can not find another way to delete expired blocks when compactor does not work as expected |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
We are encountering similar problems. |
reopen |
@yuanhuaiwang Does https://thanos.io/tip/operating/compactor-backlog.md/ work for you? |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
I will close this issue as it is not a bug, rather a scaling issue. For solving this problem https://thanos.io/tip/operating/compactor-backlog.md/ could help. |
thanos version 0.16.0
the oldest block is 2020-09-18 when I start using object storage, it's about 2 month ago. yet the block is still there. the size of object storage is keep growing, and I didn't see any log about deleting blocks in compact. only
marking compacted block for deletion
. is there anything wrong?The text was updated successfully, but these errors were encountered: