Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate base_path for s3 repository #93265

Closed
MarcinGinszt opened this issue Jan 26, 2023 · 6 comments
Closed

validate base_path for s3 repository #93265

MarcinGinszt opened this issue Jan 26, 2023 · 6 comments
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@MarcinGinszt
Copy link

MarcinGinszt commented Jan 26, 2023

Description

Documentation states, that base_path for s3_repository should not start or end with /.
However, this is not obvious and not validated anyhow. We had a repository with base_path starting from /, that had status verified and it contained completed snapshots- however, the snapshots were actually not synchronized with S3. I find that dangerously misleading.
In addition to that, I wonder if the documentation is actually right on this topic- for us (Elasticsearch v7.9.3) repository doesn't work for base path without trailing /.

@MarcinGinszt MarcinGinszt added >enhancement needs:triage Requires assignment of a team area label labels Jan 26, 2023
@inqueue inqueue added :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed needs:triage Requires assignment of a team area label labels Jan 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure labels Jan 27, 2023
@DaveCTurner
Copy link
Contributor

I feel similarly, but this is a pretty tricky area and unfortunately not one we can address without making changes that would be breaking for at least some users. See #46869 for some related discussion of the problems.

Yet, we haven't seen this as an actual problem when using the real S3, only when using something that claims (incorrectly) to be S3-compatible. Are you using something other than the real S3 @MarcinGinszt?

@MarcinGinszt
Copy link
Author

MarcinGinszt commented Jan 27, 2023

@DaveCTurner We are using S3.
I'm sorry, I know there is a lot to unwrap in my message and maybe I should rather create a bug issue.
Let me know if I could help with any information.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jan 27, 2023

Interesting.

More detail would be helpful, yes. In particular, could you share a recursive listing of the bucket contents that seems not to be working for you? There shouldn't be any sensitive data in such a listing, the blob names are all random, so ideally use something like https://gist.github.com. If you'd rather not share it publically, let me know and I can arrange a private channel too.

@MarcinGinszt
Copy link
Author

Ok, I'm ashamed of myself. Looks like the Elasticsearch works correctly, and just S3 path handling is making this complicated.

The snapshot with base_path containing trailing / were not visible in the S3, because AWS can't properly display this directory if it is at bucket root level- since the directory name is interpreted as directory path. Recursive listing found those files.
If the base_path contains the / suffix, removal suffix prefix will cause errors, because meta files will be in the same directory, while indices path will differ by mentioned '/'- making Elasticsearch unable to find expected indices.

Having a warning/ documentation about '/' prefix or suffix while creating the repository could save a lot of confusion. Currently, base_path is only described as The bucket path to the repository data..

@DaveCTurner
Copy link
Contributor

😁 I did say this was a tricky area...

My understanding of your last message is that there's no problem to fix here, it's just that when looking at the bucket contents in the AWS UI you didn't see something you expected to see. I think that's really a problem with your expectations rather than because there's any action needed on the Elasticsearch side. We don't think we should push users towards a config that makes the repository contents appear in the AWS UI as you expect, so we won't be adjusting any docs or warnings as you suggest. Instead, I recommend you only interact with the repository contents via the Elasticsearch APIs. In particular:

As there's no action needed here I'm closing this. If you have any more questions, please open a thread on the Elasticsearch forum instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement feedback_needed Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

4 participants