-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find expired and active commits #2069
Conversation
api/swagger.yml
Outdated
GarbageCollectionPrepareRequest: | ||
type: object | ||
properties: | ||
previous_result_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example? Is it an object-store path or a lakeFS one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to run-id and added example
GarbageCollectionRules: | ||
type: object | ||
properties: | ||
default_retention_days: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the relation between a rule's retention_days
and the global default_retention_days
?
tags: | ||
- retention | ||
operationId: getGarbageCollectionRules | ||
responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing 404 for missing repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
application/json: | ||
schema: | ||
$ref: "#/components/schemas/GarbageCollectionRules" | ||
responses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing 404.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
func NewRuleManager(blockAdapter block.Adapter, blockStoragePrefix string) *RuleManager { | ||
return &RuleManager{blockAdapter: blockAdapter, configurationFileSuffix: fmt.Sprintf("/%s/retention/rules/config.json", blockStoragePrefix)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set retention/rules/config.json
as const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,214 @@ | |||
package ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great tests!
pkg/catalog/catalog.go
Outdated
for _, commitRow := range previousCommits { | ||
previouslyExpiredCommits = append(previouslyExpiredCommits, graveler.CommitID(commitRow[1])) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not all expired, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixing
pkg/catalog/catalog.go
Outdated
return c.Store.SetRetentionRules(ctx, graveler.RepositoryID(repositoryID), rules) | ||
} | ||
|
||
func (c *Catalog) PrepareExpiredCommits(ctx context.Context, repository string, previousResultPath string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract the csv parsing to a designated parser?
It should also handle the schema of this file which is currently static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/catalog/catalog.go
Outdated
return "", err | ||
} | ||
} | ||
csvWriter.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also close it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to close here: we are writing to a strings.Builder
pkg/catalog/catalog.go
Outdated
csvWriter.Flush() | ||
commitsStr := b.String() | ||
runID := uuid.New().String() | ||
path := fmt.Sprintf("_lakefs/retention/commits/run_id=%s/commits.csv", runID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_lakefs
prefix should be passed to you from the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the /retention/commits/
part too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done the first one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Great!!
The code look great, liked the tests.
I have a few concerns about the API paths, the GarbageCollectionPrepareRequest
contains the full path, I think it should only contain the runID. the prepare
path could be a bit confusing. There are places the name garbage collection
is used and other where the name retention
is used, which is OK, but got me thinking that, when working on lifecycle we might decide to change the API (or align it somehow), at that point it will be a breaking change. I suggest we leave the API for setting and getting configurations for later, what do you think?
api/swagger.yml
Outdated
previous_result_path: | ||
type: string | ||
description: path to the result of a previous successful GC job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't run ID enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
api/swagger.yml
Outdated
properties: | ||
path: | ||
type: string | ||
description: path to a dataset of commits | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
api/swagger.yml
Outdated
|
||
|
||
/repositories/{repository}/gc/prepare: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change path to be more specific
prepare
could also mean prepare the expired addresses data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/graveler.go
Outdated
|
||
GetRetentionRules(ctx context.Context, repositoryID RepositoryID) (*RetentionRules, error) | ||
|
||
SetRetentionRules(ctx context.Context, repositoryID RepositoryID, rules *RetentionRules) error | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this retention rules or garbage collection rules?
In the future we will also have lifecycle rules.
Not sure if it will be together but it could get a bit confusing.
Maybe we should give it another thought before adding the API routes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to garbage collection rules
pkg/graveler/graveler.go
Outdated
@@ -856,6 +879,32 @@ func (g *Graveler) GetStagingToken(ctx context.Context, repositoryID RepositoryI | |||
return &branch.StagingToken, nil | |||
} | |||
|
|||
func (g *Graveler) GetRetentionRules(ctx context.Context, repositoryID RepositoryID) (*RetentionRules, error) { | |||
// TODO use "_lakefs" from configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
pkg/graveler/graveler.go
Outdated
} | ||
|
||
func (g *Graveler) SetRetentionRules(ctx context.Context, repositoryID RepositoryID, rules *RetentionRules) error { | ||
// TODO use "_lakefs" from configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
pkg/catalog/catalog.go
Outdated
csvWriter.Flush() | ||
commitsStr := b.String() | ||
runID := uuid.New().String() | ||
path := fmt.Sprintf("_lakefs/retention/commits/run_id=%s/commits.csv", runID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use hardcoded path, get _lakefs
from the configurations. maybe even use a function to get the path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/api/controller.go
Outdated
func (c *Controller) PrepareGarbageCollectionCommits(w http.ResponseWriter, r *http.Request, body PrepareGarbageCollectionCommitsJSONRequestBody, repository string) { | ||
if !c.authorize(w, r, []permissions.Permission{ | ||
{ | ||
Action: permissions.ListObjectsAction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be same as log commits ?
ReadBranchAction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I need this permission for all branches, and not just one, I'm adding a dedicated permission for this: retention:PrepareGarbageCollectionCommits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that prepare
operation can take time to perform and can cause the operation to timeout - do we have numbers on how long it can take to process the request?
api/swagger.yml
Outdated
GarbageCollectionCommits: | ||
type: object | ||
properties: | ||
path: | ||
type: string | ||
description: path to a dataset of commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the type/format of the collection as a parameter or in the description?
pkg/permissions/actions.go
Outdated
GetGarbageCollectionRules = "retention:GetGarbageCollectionRules" | ||
SetGarbageCollectionRules = "retention:SetGarbageCollectionRules" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add documentation https://docs.lakefs.io/reference/authorization.html - can be different PR as we can merge this one before the release
pkg/api/controller.go
Outdated
func (c *Controller) SetGarbageCollectionRules(w http.ResponseWriter, r *http.Request, body SetGarbageCollectionRulesJSONRequestBody, repository string) { | ||
if !c.authorize(w, r, []permissions.Permission{ | ||
{ | ||
Action: permissions.GetGarbageCollectionRules, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetGarbageCollectionRules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/catalog/catalog.go
Outdated
return "", err | ||
} | ||
csvReader := csv.NewReader(previousRunReader) | ||
previousCommits, err := csvReader.ReadAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer iteration as we don't keep the array of record and just append them into our own structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and moved to another place)
pkg/catalog/catalog.go
Outdated
return "", err | ||
} | ||
} | ||
csvWriter.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to call Error() after this call to capture possible errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/catalog/catalog.go
Outdated
b := &strings.Builder{} | ||
csvWriter := csv.NewWriter(b) | ||
for _, commitID := range expiredCommits { | ||
err = csvWriter.Write([]string{string(commitID), strconv.FormatBool(true)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use "true" or capture the value outside the loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/catalog/catalog.go
Outdated
} | ||
} | ||
for _, commitID := range activeCommits { | ||
err = csvWriter.Write([]string{string(commitID), strconv.FormatBool(false)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use "false" or capture the value outside the loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/catalog/catalog.go
Outdated
if err != nil { | ||
return "", err | ||
} | ||
previouslyExpiredCommits := make([]graveler.CommitID, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need zero allocated slice for this one
previouslyExpiredCommits := make([]graveler.CommitID, 0) | |
var previouslyExpiredCommits []graveler.CommitID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and moved to another place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just remove the user a var for an empty struct - can just pass it
"github.com/treeverse/lakefs/pkg/graveler" | ||
) | ||
|
||
var empty struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this one
Closes #2067
_lakefs/retention/rules
._lakefs/retention/commits
.