-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove linked addresses from KV #7725
Conversation
♻️ PR Preview 63ee494 has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
No large changes. Most worrying one is actually that the we encourage confusing the URL signing key with the really secret signing keys.
Also having the signing configuration in blockstore makes sense, so maybe the digest code should live in block/adapter.go?
pkg/api/controller.go
Outdated
@@ -631,6 +652,53 @@ func (c *Controller) StsLogin(w http.ResponseWriter, r *http.Request, body apige | |||
writeResponse(w, r, http.StatusOK, responseToken) | |||
} | |||
|
|||
func (c *Controller) verifyLinkAddress(repository, branch, path, physicalAddress string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might prefer for this to be on Catalog. Controller is supposed to manage the protocol, while Catalog is supposed to manage... other things. Suppose we decide to add some fake S3 call to the gateway that links - how would we acccess it?
It would also make this function easier to test -- pkg/api is our most slowest most complex "unit" test.
In any case not in this PR, but consider opening an issue.
pkg/api/controller.go
Outdated
@@ -631,6 +652,53 @@ func (c *Controller) StsLogin(w http.ResponseWriter, r *http.Request, body apige | |||
writeResponse(w, r, http.StatusOK, responseToken) | |||
} | |||
|
|||
func (c *Controller) verifyLinkAddress(repository, branch, path, physicalAddress string) error { | |||
address, signature, found := strings.Cut(physicalAddress, ",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes storage namespace does not contain any colons. But it could!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you recommend to approach this?
Any set of characters can potential exist in the storage namespace.
@@ -5121,27 +5188,6 @@ func resolvePathList(objects, prefixes *[]string) []catalog.PathRecord { | |||
return pathRecords | |||
} | |||
|
|||
func NewController(cfg *config.Config, catalog *catalog.Catalog, authenticator auth.Authenticator, authService auth.Service, authenticationService authentication.Service, blockAdapter block.Adapter, metadataManager auth.MetadataManager, migrator Migrator, collector stats.Collector, cloudMetadataProvider cloud.MetadataProvider, actions actionsHandler, auditChecker AuditChecker, logger logging.Logger, sessionStore sessions.Store, pathProvider upload.PathProvider, usageReporter stats.UsageReporterOperations) *Controller { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, looks like you moved it to a better place.
@@ -5325,7 +5371,7 @@ func (c *Controller) GetUsageReportSummary(w http.ResponseWriter, r *http.Reques | |||
writeResponse(w, r, http.StatusOK, response) | |||
} | |||
|
|||
func (c *Controller) CreateUserExternalPrincipal(w http.ResponseWriter, r *http.Request, body apigen.CreateUserExternalPrincipalJSONRequestBody, userID string, params apigen.CreateUserExternalPrincipalParams) { | |||
func (c *Controller) CreateUserExternalPrincipal(w http.ResponseWriter, r *http.Request, _ apigen.CreateUserExternalPrincipalJSONRequestBody, userID string, params apigen.CreateUserExternalPrincipalParams) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my fmt
pkg/config/defaults.go
Outdated
viper.SetDefault(BlockstoreTypeKey, "local") | ||
} | ||
|
||
viper.SetDefault("blockstore.encrypt.secret_key", DefaultSecretSigningKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let's use a different default here. This secret key considerably less secret, is really cheap to change. By using the same default we encourage people to give it the same value.
pkg/api/controller.go
Outdated
return nil | ||
} | ||
|
||
func (c *Controller) encryptAddress(logicalAddress string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (c *Controller) encryptAddress(logicalAddress string) string { | |
func (c *Controller) signAddress(logicalAddress string) string { |
pkg/api/controller.go
Outdated
|
||
func (c *Controller) getAddressWithSignature(repository, branch, path string) string { | ||
physicalPath := c.PathProvider.NewPath() | ||
return physicalPath + "," + c.encryptAddress(getAddressToSign(repository, branch, path, physicalPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"," should be a const somewhere, it will make it clearer that verification uses the same thing.
pkg/api/controller.go
Outdated
h := hmac.New(sha256.New, []byte(c.Config.Blockstore.Encrypt.SecretKey)) | ||
h.Write([]byte(stringToVerify)) | ||
calculated := h.Sum(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can and should use encryptAddress() (or whatever it's new name will be).
pkg/config/config.go
Outdated
@@ -221,6 +221,9 @@ type Config struct { | |||
} `mapstructure:"ui_config"` | |||
} `mapstructure:"auth"` | |||
Blockstore struct { | |||
Encrypt struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encrypt struct { | |
Signing struct { |
One more thing! Deploying this change will invalidate all in-fought presigned URLs. We need to announce that. Please add a changelog entry to that effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will continue later
@@ -1,6 +1,11 @@ | |||
# Changelog | |||
|
|||
# v1.20.0 | |||
# Unreleased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request by @arielshaqed
esti/lakectl_util.go
Outdated
@@ -31,7 +31,7 @@ var ( | |||
reShortCommitID = regexp.MustCompile(`[\d|a-f]{16}`) | |||
reChecksum = regexp.MustCompile(`([\d|a-f]{32})|(0x[0-9A-F]{15})`) | |||
reEndpoint = regexp.MustCompile(`https?://\w+(:\d+)?/api/v\d+/`) | |||
rePhysicalAddress = regexp.MustCompile(`/data/[0-9a-v]{20}/[0-9a-v]{20}`) | |||
rePhysicalAddress = regexp.MustCompile(`/data/[0-9a-v]{20}/[0-9a-v]{20}?(,.+)*`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you have 2 problems....
And one of them is /data/aaaaaaaaaaaaaaaaaaaa/aaaaaaaaaaaaaaaaaaaa,sd,cds,sdc,sdc,sdc,sd,csdc,
t.Fatal(err) | ||
} | ||
|
||
t.Run("get and link physical address", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIC you don't have an happy flow test now..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do
@@ -221,6 +221,9 @@ type Config struct { | |||
} `mapstructure:"ui_config"` | |||
} `mapstructure:"auth"` | |||
Blockstore struct { | |||
Signing struct { | |||
SecretKey SecureString `mapstructure:"secret_key" validate:"required"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change... remove the required
part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its required but has a default per discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot work otherwise
pkg/graveler/graveler.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best PR EVER!!!
pkg/catalog/catalog.go
Outdated
@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos | |||
return repos, nil | |||
} | |||
|
|||
func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error { | |||
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the following storage namespace work?
s3://bucket/s,o,m,e/p,a,t,h/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still fails for some storage namespaces.
It seems to me that we are putting a lot of effort into writing a single incorrect line of code, when we could write <5 correct lines like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-o-Z as Ariel mentioned, the issue still exists
pkg/catalog/catalog.go
Outdated
} | ||
|
||
func getAddressToSign(repository, branch, path, physicalAddress string) string { | ||
return repository + branch + path + physicalAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these calls return different address to sign?
getAddressToSign("aa", "bb", "cc", "dd")
getAddressToSign("a", "ab", "bc", "cdd")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Why not just sign the actual generated physical address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We said that signature should be dependant on repo, branch and path (i.e. you can only use a linked address with a single logical path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@N-o-Z how so? It's just string concatenation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, to answer both questions:
- If we sign only the physical address, then users can user generated linked addresses of repo1/branch2/path3 in repo4/branch5/path6
- Because physical address is always unique, the prefix for the address to sign will always be unique and therefore we will never get to a situation of:
getAddressToSign("aa", "bb", "cc", "dd")
getAddressToSign("a", "ab", "bc", "cdd")
However, regarding 2, if it makes more sense I can add some separator between the parts of the signature
|
||
jobData := []struct { | ||
name string | ||
interval time.Duration | ||
fn func(context.Context) | ||
}{ | ||
{ | ||
name: "delete expired link addresses", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Some tokens will be kept forever in the kv..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't invest on the cleanup as this will probably won't make a dent in the DB
pkg/catalog/catalog.go
Outdated
h := hmac.New(sha256.New, []byte(c.signingKey)) | ||
h.Write([]byte(stringToVerify)) | ||
calculated := h.Sum(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract to func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty please? 🥺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think we can do a bit better on signing the exact string of the physical address, and on precisely extracting the digest. They are both <10 lines of code, combined.
pkg/catalog/catalog.go
Outdated
@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos | |||
return repos, nil | |||
} | |||
|
|||
func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error { | |||
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still fails for some storage namespaces.
It seems to me that we are putting a lot of effort into writing a single incorrect line of code, when we could write <5 correct lines like here.
pkg/catalog/catalog.go
Outdated
} | ||
|
||
func getAddressToSign(repository, branch, path, physicalAddress string) string { | ||
return repository + branch + path + physicalAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Why not just sign the actual generated physical address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the comments were not resolved..
pkg/catalog/catalog.go
Outdated
@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos | |||
return repos, nil | |||
} | |||
|
|||
func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error { | |||
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@N-o-Z as Ariel mentioned, the issue still exists
pkg/catalog/catalog.go
Outdated
h := hmac.New(sha256.New, []byte(c.signingKey)) | ||
h.Write([]byte(stringToVerify)) | ||
calculated := h.Sum(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty please? 🥺
pkg/catalog/catalog.go
Outdated
} | ||
|
||
func getAddressToSign(repository, branch, path, physicalAddress string) string { | ||
return repository + branch + path + physicalAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@N-o-Z how so? It's just string concatenation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! But please be sure of the regexp :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
6edb8c4
to
17f6a4a
Compare
* Remove linked addresses from KV * CR Fixes * Add changelog * Change delimiter and encoding * More fixes * More fixes 2 * More More Fixes * Update esti/lakectl_util.go Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io> * Fix azure regex --------- Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
Closes #7716
Change Description
Background
Improve performance on write request using Get/Link physical address
Today address are written to KV under the repository partition for validation purposes.
This requires reading, writing and deleting of the linked address kv entry for every write.
As a result we are limited to 500 requests/sec.
To remove this limitation, it was decided to not track physical addresses generated by lakeFS in KV.
Instead, we chose a more lenient approach which only verifies the linked address can be used for a specific repository/branch/path and in a given timeframe
Testing Details
Added/modified unit tests
Breaking Change?
No
Additional info
Introduced a new configuration variable for the secret signing key for the linked address signature. It is required, but for backward compatibility purposes was given a default value which must be modified in production