Move validation to separate package, fix misleading errors on branch creation #2859

ortz · 2022-01-04T15:20:09Z

Disclaimer: first pull request, feel free to comment and take a strict approach.

I started working on this PR in order to fix some branch-creation error messages that are misleading, due to the way it's currently modeled (validation logic resides within the catalog package, which is irrelevant for the CLI) I moved the validation logic into a separate package so both the CLI and the API would consume it.
I've also aligned the branch API calls arguments to match the same name in the validation error messages.

After I moved the logic to a separate, I've included the same validation for the CLI branch create command and aligned the error message.

Errors:
Branch creation with bad character (%) which the url.Parse returns an error for.
➜ lakeFS git:(fix/branch-creation-error-1634) lakectl branch create lakefs://test/testing-spark% --source lakefs://test/main Invalid 'branch': parsing lakefs://test/testing-spark%: malformed lakefs uri Error executing command.

Branch creation with bad character (.) which the regular expression returns an error for.
➜ lakeFS git:(fix/branch-creation-error-1634) lakectl branch create lakefs://test/testing-spark. --source lakefs://test/main Invalid branch: not a valid ref uri Error executing command.

The same, but directly against the API (bypass the url.Parse validation):
➜ lakeFS git:(fix/branch-creation-error-1634) curl -u ***:*** -XPOST -H "Content-Type: application/json" -d '{"name":"testing-spark%", "source": "main"}' http://localhost:8000/api/v1/repositories/test/branches {"message":"argument branch: invalid value: validation error"}

nopcoder

Initial review - it is still a draft so I didn't know if it is final (move it to non draft if it is ready for review).
A lot of files changes as part of func rename - but I missed the part where the CLI changes - just point me to what I missed.

cmd/lakectl/cmd/abuse.go

nopcoder · 2022-01-05T18:23:33Z

pkg/uri/parser.go

+	return len(u.Repository) > 0 && len(u.Ref) == 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository)
 }

 func (u *URI) IsRef() bool {
-	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil
+	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref)
 }

 func (u *URI) IsFullyQualified() bool {
-	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil
+	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref)


This changes the behavior of these methods - it was 'IsRepository' and not it is 'IsValidRepository' .
Did you verified that all the places we currently use this would like this behavior?

@nopcoder thanks for the comment, you're right that the behaviour is due to change.

Here's a list of the commands using the Is* validation functions I modified (both directly and indirectly):

### IsRepository
lakefs import
lakectl branch
lakectl branch-protect - No validation
lakectl cat-hook-output
lakectl gc set-config - No validation
lakectl gc get-config - No validation
lakectl refs-restore
lakectl refs-dump
lakectl repo create
lakectl repo create-bare
lakectl repo delete
lakectl actions runs describe
lakectl actions runs list
lakectl show
lakectl tag list

### IsRef
lakefs loadtest entry
lakectl abuse random-read
lakectl abuse random-write
lakectl abuse create-branches
lakectl branch
lakectl commit
lakectl diff
lakectl log
lakectl merge
lakectl tag

### IsFullyQualified
lakectl fs

Most of the functions already validate the logic within the API, after we "wasted" resources (i.e file checks, scanning, querying the database, etc).
There are 3 areas that doesn't do validation, therefore, they are open to problems (i.e someone changing GC configuration for an invalid repository name).
This change will block it, at least from the CLI perspective.
Other than that, I think that the Nessie CLi automation could help a lot saving time checking it manually like it did this time.

As I see it, it doesn't break anything.

Let me know your thoughts.

Great, I was more worried about backend/pkg code that was using these function to check for a structure of URI without validation.

pkg/validator/validate.go

nopcoder · 2022-01-05T18:31:11Z

pkg/validator/validate.go

+	ErrInvalid           = errors.New("validation error")
+	ErrInvalidType       = fmt.Errorf("invalid type: %w", ErrInvalid)
+	ErrRequiredValue     = fmt.Errorf("required value: %w", ErrInvalid)
+	ErrInvalidValue      = fmt.Errorf("invalid value: %w", ErrInvalid)
+	ErrPathRequiredValue = fmt.Errorf("missing path: %w", ErrRequiredValue)


The error make sense at this level as a general validation package, I think for specific types the caller can control the specific error and help by providing the context.
For example if catalog call a validation function from this package, the return error will be wrapped by catalog.ErrXXX error so a caller to the catalog will identify the specific package error.
Same for specific type - like Path - the catalog will get an error from the validation package - ErrInvalid and wrap it with catalog.ErrPathXXX.

@nopcoder I thought about it, the problem with this approach is that it might mask some meaningful errors, for instance, the ValidatePath function returns 3 different errors (ErrInvalidType, ErrPathRequiredValue and a custom one wrapping the ErrInvalidValue) and I can't really handle it properly from the caller function as it fails fast (first validation error returns).
I think we can do some work here and rethink of error handling (there are some packages that handle it better compared to the built-in one).
If you think we can still achieve custom errors without loosing visibility, I'd appreciate to hear more on your thoughts

I moved the package-specific validations into the packages scope, this way, the errors, types, etc would be managed by the package.
I left the common resources (regex, general functions, etc) in the validate package

pkg/catalog/validate.go

linting - package imports

revert function rename, making validator package free

CLAassistant · 2022-01-20T15:05:43Z

All committers have signed the CLA.

per package validations update deleteobjects error location

nopcoder

Looks good to me - minor comments to consider.

pkg/catalog/errors.go

nopcoder · 2022-01-26T06:57:26Z

pkg/graveler/validate.go

+func isControlCodeOrSpace(r rune) bool {
+	const space = 0x20
+	return r <= space
+}


not sure its you code - but there are IsControl and IsSpace in unicode package

it's not mine; yes - we can use these functions from the unicode package, I wonder if that should be included in this issue/pull request or it's out of scope.

nopcoder · 2022-01-26T06:59:37Z

pkg/uri/parser.go

+	return len(u.Repository) > 0 && len(u.Ref) == 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository)
 }

 func (u *URI) IsRef() bool {
-	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil
+	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path == nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref)
 }

 func (u *URI) IsFullyQualified() bool {
-	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil
+	return len(u.Repository) > 0 && len(u.Ref) > 0 && u.Path != nil && validator.ReValidRepositoryID.MatchString(u.Repository) && validator.ReValidBranchID.MatchString(u.Ref)


Great, I was more worried about backend/pkg code that was using these function to check for a structure of URI without validation.

ortz added the bug label Jan 4, 2022

ortz requested review from nopcoder and talSofer January 4, 2022 15:27

ortz self-assigned this Jan 4, 2022

ortz added the include-changelog label Jan 4, 2022

ortz force-pushed the fix/branch-creation-error-1634 branch from 789697b to b55db55 Compare January 4, 2022 16:02

nopcoder suggested changes Jan 5, 2022

View reviewed changes

ortz force-pushed the fix/branch-creation-error-1634 branch from b55db55 to 2378ff2 Compare January 13, 2022 09:46

ortz marked this pull request as ready for review January 13, 2022 09:48

ortz requested a review from nopcoder January 13, 2022 10:19

ortz added 3 commits January 18, 2022 12:10

move validation to separate package, fix misleading errors

ac50377

linting - unkeyed fields

903bd23

linting - package imports

revert function rename, making validator package free

48a31a8

revert function rename, making validator package free

ortz force-pushed the fix/branch-creation-error-1634 branch from 20eb7b5 to e68ddb9 Compare January 18, 2022 10:11

linting validator variables

418bf0c

per package validations update deleteobjects error location

ortz force-pushed the fix/branch-creation-error-1634 branch from a62491e to 418bf0c Compare January 20, 2022 15:12

nopcoder approved these changes Jan 26, 2022

View reviewed changes

ortz added 2 commits February 3, 2022 10:22

Merge branch 'master' into fix/branch-creation-error-1634

77b1f05

align errors

06370af

ortz merged commit f7012dd into master Feb 6, 2022

ortz deleted the fix/branch-creation-error-1634 branch February 9, 2022 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move validation to separate package, fix misleading errors on branch creation #2859

Move validation to separate package, fix misleading errors on branch creation #2859

ortz commented Jan 4, 2022 •

edited

Loading

nopcoder left a comment

nopcoder Jan 5, 2022

ortz Jan 11, 2022 •

edited

Loading

nopcoder Jan 26, 2022

nopcoder Jan 5, 2022

ortz Jan 13, 2022

ortz Jan 18, 2022

CLAassistant commented Jan 20, 2022 •

edited

Loading

nopcoder left a comment

nopcoder Jan 26, 2022

ortz Feb 3, 2022

nopcoder Jan 26, 2022

Move validation to separate package, fix misleading errors on branch creation #2859

Move validation to separate package, fix misleading errors on branch creation #2859

Conversation

ortz commented Jan 4, 2022 • edited Loading

nopcoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ortz Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Jan 20, 2022 • edited Loading

nopcoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ortz commented Jan 4, 2022 •

edited

Loading

ortz Jan 11, 2022 •

edited

Loading

CLAassistant commented Jan 20, 2022 •

edited

Loading