-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move validation to separate package, fix misleading errors on branch creation #2859
Conversation
789697b
to
b55db55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review - it is still a draft so I didn't know if it is final (move it to non draft if it is ready for review).
A lot of files changes as part of func rename - but I missed the part where the CLI changes - just point me to what I missed.
return len(u.Repository) > 0 && len(u.Ref) == 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) | ||
} | ||
|
||
func (u *URI) IsRef() bool { | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref) | ||
} | ||
|
||
func (u *URI) IsFullyQualified() bool { | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior of these methods - it was 'IsRepository' and not it is 'IsValidRepository' .
Did you verified that all the places we currently use this would like this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nopcoder thanks for the comment, you're right that the behaviour is due to change.
Here's a list of the commands using the Is* validation functions I modified (both directly and indirectly):
### IsRepository
lakefs import
lakectl branch
lakectl branch-protect - No validation
lakectl cat-hook-output
lakectl gc set-config - No validation
lakectl gc get-config - No validation
lakectl refs-restore
lakectl refs-dump
lakectl repo create
lakectl repo create-bare
lakectl repo delete
lakectl actions runs describe
lakectl actions runs list
lakectl show
lakectl tag list
### IsRef
lakefs loadtest entry
lakectl abuse random-read
lakectl abuse random-write
lakectl abuse create-branches
lakectl branch
lakectl commit
lakectl diff
lakectl log
lakectl merge
lakectl tag
### IsFullyQualified
lakectl fs
Most of the functions already validate the logic within the API, after we "wasted" resources (i.e file checks, scanning, querying the database, etc).
There are 3 areas that doesn't do validation, therefore, they are open to problems (i.e someone changing GC configuration for an invalid repository name).
This change will block it, at least from the CLI perspective.
Other than that, I think that the Nessie CLi automation could help a lot saving time checking it manually like it did this time.
As I see it, it doesn't break anything.
Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I was more worried about backend/pkg code that was using these function to check for a structure of URI without validation.
ErrInvalid = errors.New("validation error") | ||
ErrInvalidType = fmt.Errorf("invalid type: %w", ErrInvalid) | ||
ErrRequiredValue = fmt.Errorf("required value: %w", ErrInvalid) | ||
ErrInvalidValue = fmt.Errorf("invalid value: %w", ErrInvalid) | ||
ErrPathRequiredValue = fmt.Errorf("missing path: %w", ErrRequiredValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error make sense at this level as a general validation package, I think for specific types the caller can control the specific error and help by providing the context.
For example if catalog call a validation function from this package, the return error will be wrapped by catalog.ErrXXX error so a caller to the catalog will identify the specific package error.
Same for specific type - like Path - the catalog will get an error from the validation package - ErrInvalid and wrap it with catalog.ErrPathXXX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nopcoder I thought about it, the problem with this approach is that it might mask some meaningful errors, for instance, the ValidatePath function returns 3 different errors (ErrInvalidType, ErrPathRequiredValue and a custom one wrapping the ErrInvalidValue) and I can't really handle it properly from the caller function as it fails fast (first validation error returns).
I think we can do some work here and rethink of error handling (there are some packages that handle it better compared to the built-in one).
If you think we can still achieve custom errors without loosing visibility, I'd appreciate to hear more on your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the package-specific validations into the packages scope, this way, the errors, types, etc would be managed by the package.
I left the common resources (regex, general functions, etc) in the validate package
b55db55
to
2378ff2
Compare
linting - package imports
revert function rename, making validator package free
20eb7b5
to
e68ddb9
Compare
per package validations update deleteobjects error location
a62491e
to
418bf0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - minor comments to consider.
func isControlCodeOrSpace(r rune) bool { | ||
const space = 0x20 | ||
return r <= space | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure its you code - but there are IsControl and IsSpace in unicode package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not mine; yes - we can use these functions from the unicode package, I wonder if that should be included in this issue/pull request or it's out of scope.
return len(u.Repository) > 0 && len(u.Ref) == 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) | ||
} | ||
|
||
func (u *URI) IsRef() bool { | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref) | ||
} | ||
|
||
func (u *URI) IsFullyQualified() bool { | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil | ||
return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I was more worried about backend/pkg code that was using these function to check for a structure of URI without validation.
Closes #1634
Disclaimer: first pull request, feel free to comment and take a strict approach.
I started working on this PR in order to fix some branch-creation error messages that are misleading, due to the way it's currently modeled (validation logic resides within the catalog package, which is irrelevant for the CLI) I moved the validation logic into a separate package so both the CLI and the API would consume it.
I've also aligned the branch API calls arguments to match the same name in the validation error messages.
After I moved the logic to a separate, I've included the same validation for the CLI branch create command and aligned the error message.
Errors:
Branch creation with bad character (%) which the url.Parse returns an error for.
➜ lakeFS git:(fix/branch-creation-error-1634) lakectl branch create lakefs://test/testing-spark% --source lakefs://test/main Invalid 'branch': parsing lakefs://test/testing-spark%: malformed lakefs uri Error executing command.
Branch creation with bad character (.) which the regular expression returns an error for.
➜ lakeFS git:(fix/branch-creation-error-1634) lakectl branch create lakefs://test/testing-spark. --source lakefs://test/main Invalid branch: not a valid ref uri Error executing command.
The same, but directly against the API (bypass the url.Parse validation):
➜ lakeFS git:(fix/branch-creation-error-1634) curl -u ***:*** -XPOST -H "Content-Type: application/json" -d '{"name":"testing-spark%", "source": "main"}' http://localhost:8000/api/v1/repositories/test/branches {"message":"argument branch: invalid value: validation error"}