-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow usage of folders within S3 #697
Comments
We wanted to postpone this as it adds mostly unnecessary complexity to the code, but if it is a blocker, we will take PRs for it (: Proper tests for this are must have. |
Also we probably need that on bucket lvl not provider lvl |
You say unnecessary complexity, but what about organisations that have conventions that must be followed for the storage of data? It then becomes quite necessary. I'm happy to look at this if you're happy to take a PR, but could you clarify what you'd expect with regards to proper tests please? |
Amount of additional complexity, maintainance and tests in comparison to added value is quite large. (: Can you elaborate on conventions? Why you cannot just create another bucket using any naming convention you need? In terms of tests I just mean being careful to not surprise existing users and have consistent handling of this across all Thanos components, nothing more. |
This just comes down to the rules that govern data in the organisation that I work for. Simply put, I'm not allowed to create buckets using any name. I'm now looking to use Thanos for cross-cluster federation but I'm still getting to grips with how exactly that works. Could I have all Thanos sidecars in every cluster write to the same bucket? If I then have a query deployment in every cluster plus a monitoring cluster with a querier reading from every other querier, would that mean a Thanos store in the monitoring cluster only? If I can make this work with just needing one new bucket for all clusters, there's more possibility for a new bucket being permitted. Many thanks in advance for your help. |
Yes.
Yes (:
What do you mean? |
@bwplotka, thanks. I've made some progress on this, but don't seem able to make queries from the querier in my monitoring cluster to each of the other query components. Could you please clarify if --store can be passed to a query component to point at another query component? |
yes, but you need to pass it's gRPC endpoint (: |
@bwplotka another reason; in AWS, S3 buckets are finite per account and cloud-wide uniquely named. So, they are managed as such. |
This is something we also would like to have, since we're planning to partition thanos-store components for scaling purposes. I have a strawman implementation that I've done some minimal testing on (claytono/thanos@7598121). @bwplotka It sounds like you'd prefer to have this at the bucket level, but I think the providers need to be at least aware of it. The reason being that if you're storing Thanos data in say If the providers need to know about the prefix configured, then I'm not seeing a lot of benefit to having the block layer also be aware of it. What are your thoughts? I'm willing to put together a PR once I know what it needs to include. |
Hm. I am still not sold on those arguments to be honest (:
OK, but
For current Thanos implementation there is super easy workaround. Use some bucket for Thanos. Put But generally I am not strongly against this either. @domgreen any thoughts? Regarding @claytono implementation: Wonder if we can create Bucket wrapper (similar to wrapper to metrics), that could be appended to any provider. I think it's doable. What do you think? |
The limit is 100 per account by default, although you can request that AWS raise that limit. Note that bucket names have to be globally unique across all of S3, not just in a single account. I would expect that it's not uncommon to have creation of new buckets require administrative approval of some sort, whereas using a new prefix is effortless.
Imagine that a department at a company has been given one bucket for their use. Because they host data for multiple purposes in that bucket structure looks like this:
In this scenario, if you ask the S3 API for a list of all objects in Because I don't think this is an unlikely scenario, I think it makes sense have the object store code be aware of the prefixes. This allow the object store provider to pass the prefix to object store APIs that support it. Once the code for each object store has to be updated to support prefixes, then there doesn't seem to be much advantage by abstracting it in the bucket provider. |
Cool, I think we are happy to take PRs for that (: |
Thank you for contributing this project, I'm hoping it will solve a lot of problems for us! One more use case for your consideration: permissions management. Ideally, I prefer to use: <company_name>-<service_name>/<account_name>/.... So, for us, that's mycompany-prometheus-thanos/development/.... Permissions for the production Prometheus cluster's IAM role will be granted to the production subdirectory, and development instances can use the other one. We have a few other account as well to isolate potentially high-risk services, one for CI, etc. I can work around this by using extra buckets, but usually I would use prefixes. |
A quick bump on this... One really nice thing about allowing bucket prefixes is access speed improvements. AWS increases S3 download performance by number of consumers, meaning if you have multiple thanos clusters connecting to the same s3 bucket, overall download speeds improve. |
We would really appreciate this feature. |
Having multiple sidecars uploading to the same bucket and having 1 compactor per cluster would not be recommended as the compactors would "compete" for the same data right ? |
FYI There is a proposal for this -> #1318 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue should not be stale, I would very much still like the ability to use S3 prefixes |
At present I can only specify a bucket name and Thanos writes to the top level, however I want it to use <bucket_name>/logging-and-monitoring/prometheus
The text was updated successfully, but these errors were encountered: