Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection uses superfluous '/' at beginning of storage identifier #2732

Closed
arielshaqed opened this issue Nov 21, 2021 · 3 comments · Fixed by #3525
Closed

Garbage collection uses superfluous '/' at beginning of storage identifier #2732

arielshaqed opened this issue Nov 21, 2021 · 3 comments · Fixed by #3525
Assignees
Labels
area/block-adapter bug Something isn't working team/ecosystem Team Ecosystem

Comments

@arielshaqed
Copy link
Contributor

When I enable trace logging and run the garbage collection test with #2717 I see in logs, line 337 of step 11:

GET /5351-spark3-metaclient//_lakefs/retention/gc/rules/config.json?x-id=GetObject HTTP/1.1

Complete logs attached: logs_22472.zip

Note the 2 slashes: these occur because the AWS Go SDK v2 no longer strips double slashes when they appear at the start of a bucket.

This is a repeat of a similar bug in Graveler; however because we use the block adapter directly we are again exposed. The issue is in these lines in garbage_collection_manager.go:

const (
	configFileSuffixTemplate    = "/%s/retention/gc/rules/config.json"
	addressesFilePrefixTemplate = "/%s/retention/gc/addresses/"
	commitsFileSuffixTemplate   = "/%s/retention/gc/commits/run_id=%s/commits.csv"
)

Those leading slashes end up (mistakenly!) being trimmed by the AWS Go S3 SDK (v1). If we keep this bug we will have an issue (objects misplaced on block storage) when porting GC to Azure and/or GCS. We also cannot upgrade to AWS Go SDK v2 with this.

One place where the AWS Go S3 SDK (v1) might keep on using two slashes is if you define garbage collection on a repo whose storage namespace is nested in a bucket rather than a complete bucket.

@arielshaqed arielshaqed added area/block-adapter bug Something isn't working labels Nov 21, 2021
@arielshaqed
Copy link
Contributor Author

I believe that per-repo config and actions, both of which use similar code, do NOT have this issue.

arielshaqed added a commit that referenced this issue Nov 21, 2021
DO NOT PUSH THIS CODE, wait for #2732 instead.
@arielshaqed
Copy link
Contributor Author

Discover whether moving this directly to use repo-level settings would save work for us. Also how to migrate any existing such files.

@nopcoder nopcoder added the team/versioning-engine Team versioning engine label Mar 16, 2022
@itaiad200 itaiad200 added team/ecosystem Team Ecosystem and removed team/versioning-engine Team versioning engine labels Mar 17, 2022
@arielshaqed
Copy link
Contributor Author

Check whether there is a double slash when the storage namespace includes a prefix (and is not just a bucket).

  • If there is no double slash, just close this bug.
  • If there is a double slash, the quickest fix might be to change the path, then try reading from the new location and then the old location. That's ugly but will work safely. :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/block-adapter bug Something isn't working team/ecosystem Team Ecosystem
Projects
None yet
4 participants