-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate new repo isn't using existing storage namespace #3104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this should work. What if I type this sequence of commands?
lakectl repo create lakefs://foo s3://bucket/quux & lakectl repo create lakefs://bar s3://bucket/quux
# Both might succeed, the empty commit only occurs after `ensureStorageNamespace`.
lakectl fs upload -s file lakefs://foo/main/file
lakectl fs upload -s file lakefs://boo/main/file
# Now I have two nonempty repos!
nessie/repository_test.go
Outdated
@@ -24,10 +25,39 @@ func TestRepositoryBasicOps(t *testing.T) { | |||
require.Contains(t, listedRepos, repo, "repository missing in listing") | |||
} | |||
|
|||
// get repo and try creating another repo with the same storage namespace | |||
testDupStorageNamespace(t, repos[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be a unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted and added a UT.
nessie/repository_test.go
Outdated
respUpload, err := uploadContent(ctx, repo, respGetRepo.JSON200.DefaultBranch, "path/to/file", "bla-bla") | ||
require.NoError(t, err, "failed to upload file") | ||
require.Equal(t, http.StatusCreated, respUpload.StatusCode()) | ||
|
||
respCommit, err := client.CommitWithResponse(ctx, repo, respGetRepo.JSON200.DefaultBranch, api.CommitJSONRequestBody{ | ||
Message: "nessie:repo test", | ||
}) | ||
require.NoError(t, err, "failed to commit") | ||
require.Equal(t, http.StatusCreated, respCommit.StatusCode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these commits needed here? I'd expect this feature to make it impossible to create the same repo twice, even if it has no commits. Maybe the "dummy" file would tip us off.
pkg/api/controller.go
Outdated
return err | ||
} | ||
|
||
// if a single file exists under the lakeFS prefix in the storage namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this allows multiple repositories to use the same storage namespace, as long as I don't use them first. (So it is also a race if I create both repos and then upload a file to each.)
Why not just check for existence of the "dummy" file? (It would still be a race, but at least it would prevent this case.)
Thinking about it, one safe way to do this without a full database would be to keep a Graveler object with all storage namespaces. Then you could verify-nonoverlap-and-add, and use a commit-like operation to prevent races. Maybe you could have a hidden "global lakeFS" repo and keep a file named for each storage namespace.
Anyway, the current code is racy -- and since the time between creating a repo and using it is unbounded, there is no protection against this race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using the dummy file.
I think we can live with the race on the creation of the dummy file- but if we really want to solve it we can implement a conditional put in the block adapter (like If-None-Match in S3).
Note that using a "global lakeFS" repository will not solve the race for multiple lakeFS installations (to me it's not really a problem but just noting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed Oz remarks on the requirements in #3077 - we need to tighten it some more.
The requirements states cross-installations enforcement, so a graveler object won't do the trick here.
I avoided checking the dummy
object because I feel less comfortable treating it as a contract between repositories. It's there because of a one-time check, no one ever checks it the second time, and it may very well be deleted without us knowing (GC, lifecycle policies, etc..). But I seem to be the only one who believes it's fragile, so I'll do this check..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, now that we delete a repo we also need to delete that dummy
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. And also the configurations under _lakefs :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we decide we leave it to the user.
5bb9fad
to
4553cb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is cool and certainly (much much) better than the existing, but still keeps the race condition. And it can happen, e.g. if automation sets up the two repos.
Obviously loads better than the old way of doing nothing, but still dangerous. Perhaps we will be able to do better with the refstore?
(Note that no matter what you do there is danger! E.g. I can set up two distinct lakeFS installations and share storage namespaces between them. There will always be danger, unless we keep our own directory listings for each lakeFS repo and avoid re-using filenames...)
pkg/api/controller.go
Outdated
if err != nil { | ||
msg := "Could not access storage namespace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to keep this as a field rather than as a message. Otherwise it gets harder to grep for the error message when it pops up in the log, and harder to understand what it means. Maybe log "Bad storage namespace" with a field "reason" that is either "bad url" or "invalid" or "already in user" or "access failed"? Or even drop it entirely, rephrase as a constant message "Bad storage namespace", and the error already says what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/api/controller.go
Outdated
@@ -1238,19 +1242,28 @@ func (c *Controller) CreateRepository(w http.ResponseWriter, r *http.Request, bo | |||
writeResponse(w, http.StatusCreated, response) | |||
} | |||
|
|||
func ensureStorageNamespace(ctx context.Context, adapter block.Adapter, storageNamespace string) error { | |||
var errorStorageNamespaceInUse = errors.New("lakeFS repositories can't share storage namespace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var errorStorageNamespaceInUse = errors.New("lakeFS repositories can't share storage namespace") | |
var errStorageNamespaceInUse = errors.New("lakeFS repositories can't share storage namespace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
err := adapter.Put(ctx, obj, objLen, strings.NewReader(dummyData), block.PutOpts{}) | ||
if err != nil { | ||
if _, err := c.BlockAdapter.Get(ctx, obj, objLen); err == nil { | ||
return fmt.Errorf("found lakeFS objects in the storage namespace(%s): %w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("found lakeFS objects in the storage namespace(%s): %w", | |
return fmt.Errorf("%s: %w", |
(and maybe rephrase the error itself if that doesn't look nice enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather keep it as is. The first sentence makes it clearer IMO.
56a1f7d
to
160f0f7
Compare
160f0f7
to
fe9068d
Compare
close #3077