Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate full model name in the deprecate model flow #427

Merged
merged 22 commits into from
Oct 26, 2023
Merged

Conversation

tapadipti
Copy link
Contributor

@tapadipti tapadipti commented Oct 12, 2023

Earlier, the model name was not being validated for the deprecate model action. It was getting validated for all other actions including the unassign and deregister actions (which are variations of the deprecate action).

Resolves iterative/dvc#9821 (comment)

@sweep-ai
Copy link

sweep-ai bot commented Oct 12, 2023

Apply Sweep Rules to your PR?

  • Apply: Leftover TODOs in the code should be handled.
  • Apply: All new business logic should have corresponding unit tests in the tests/ directory.
  • Apply: Any clearly inefficient or repeated code should be optimized or refactored.

gto/registry.py Outdated
@@ -421,6 +421,7 @@ def deprecate(
author: Optional[str] = None,
author_email: Optional[str] = None,
) -> Optional[Deprecation]:
self._check_args(name=name, version=None, rev=rev, deprecate_model=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just do assert_fullname_is_valid(name) if need to validate the name and nothing else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein
Copy link
Member

Thanks @tapadipti ! I left a comment + I think we would need a test for this.

@shcheklein shcheklein added the bug Something isn't working label Oct 12, 2023
@tapadipti tapadipti changed the title Check args in the deprecate model flow Validate full model name in the deprecate model flow Oct 13, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
gto/constants.py 100.00% <ø> (ø)
gto/registry.py 90.61% <100.00%> (+0.03%) ⬆️

📢 Thoughts on this report? Let us know!.

@tapadipti tapadipti temporarily deployed to pypi October 13, 2023 03:55 — with GitHub Actions Inactive
@tapadipti
Copy link
Contributor Author

Thanks @shcheklein. I've addressed your comments.

For the test, I ran it locally by calling it from a __main__ block. But I'm not sure how to run it without that.

There are more validations to be added. Eg, currently it accepts // in the name, and this is not allowed in the Git ref. I'll work on fixing this as part of iterative/dvc#9821 but it's not within the scope of this PR.

@tapadipti tapadipti requested a review from shcheklein October 13, 2023 07:08
@shcheklein
Copy link
Member

thanks @tapadipti ! it looks good, but it seems we need to fix the test. Please see the checks above.

@tapadipti tapadipti temporarily deployed to pypi October 16, 2023 04:57 — with GitHub Actions Inactive
@tapadipti
Copy link
Contributor Author

it seems we need to fix the test. Please see the checks above.

The error message returned by the name validation functions is quite long. So it looks like a new line character (\n) gets added automatically - see the extra \n between Value and must below:

❌ Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \nmust be of len >= 2 and must start and end with a letter or a number.\n

Is there a way to turn off this automatic addition?

I was not adding this \n to my test (coz I wasn't expecting it to get added when running), and locally I didn't see this issue.

To resolve this, I've now added the \n to the error message and to my test. It's passing now.

However, I'm not sure if some other products with dependencies on GTO could break due to this (if they're expecting a string without this \n). cc @iterative/studio

@shcheklein
Copy link
Member

@tapadipti I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

@tapadipti
Copy link
Contributor Author

tapadipti commented Oct 18, 2023

@tapadipti do you have sense on why is it escaped? what other symbols are escaped?

@shcheklein So this is failing for Windows only - so it must be something to do with Windows convention for line endings (not the first time Windows got me stuck for so long 😞). Still not sure how to resolve this - will try to figure out. But I think it's better to just add the line break at the end of the sentence (like I tried before).

No, I couldn't figure this out - for some reason the \ in \n is being escaped, so it becomes \\n. When I try locally, a string with \\n prints just \n. And I don't know how to replicate how the complete test suite is being run - I've run individual tests and it works fine. The only character causing me issue is the line break which is getting added in the middle of a line. Without this extra line break, there's no issue coz all the other line breaks are at the same position in both the actual and expected values.

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GTO CLI test framework is fragile (and IMO a lot of these tests are not actually useful either) since all of the tests are hard coded to check for fixed output.

The issue that shows up here is that the underlying test for CLI commands uses click.CliRunner which will implements line wrapping at a fixed maximum content width of 80. Wrapped lines will also be split using the platform specific line ending (which is CRLF on windows and not just LF). This was a non-issue for all of the other existing tests up to now (purely by coincidence) since none of them generate a line longer than 80 characters.

Adjusting the test framework to strip all newlines and replace them with spaces is probably not the right way to go since the newlines are explicitly expected output in all of the other existing tests (where the split lines are significant).

The fix here should be to just manually handle the line wrapping at 80 characters in the new test like

def test_deprecate_artifact(repo_with_commit: str):
    expected = os.linesep.join(
        [
            (
                "❌ Invalid value 'a4!'."
                " Only letters, numbers, '_', '-', '/' are allowed. Value "
            ),
            (
                "must be of len >= 2 and must start and end with a letter or"
                " a number.\n"
            ),
        ]
    )
    _check_failing_cmd(
        "deprecate",
        ["-r", repo_with_commit, "a4!"],
        expected,
    )

(this should fix the test failures and should not require modifying _check_output_exact_match at all in this PR)

@pmrowla
Copy link
Contributor

pmrowla commented Oct 18, 2023

ideally in the long term all of the tests should be updated to work off of a list of expected lines instead of a single hard coded expected string, and then the check functions would do something like

assert result.stdout.splitlines() == expected_stdout
assert result.stderr.splitlines() == expected_stderr

which will work regardless of platform/line ending, and it will account for commands where the newlines in output are significant


Should also note that this does not account for the fact that click will also automatically add indentation depending on content and context which may end up causing similar issues in the future (so exact string comparison in tests would break due to leading tabs or spaces instead of newlines)

@tapadipti
Copy link
Contributor Author

@pmrowla Yes.

Wrapped lines will also be split using the platform specific line ending (which is CRLF on windows and not just LF)

I wish this comment was here yesterday or earlier today. Figured it out the hard way today. Have fixed it by doing replace differently for Windows. Which I don't like. So I'll revert to what you've suggested - which is pretty much what I had here and here. @shcheklein Can you pls confirm you're ok with this - coz you'd earlier suggested I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

@tapadipti tapadipti temporarily deployed to pypi October 18, 2023 08:08 — with GitHub Actions Inactive
@pmrowla
Copy link
Contributor

pmrowla commented Oct 18, 2023

I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

The problem with this is that this is applied to the tests for every GTO command and it strips the line breaks in places where the test behavior is actually significant, i.e. when the expected output is explicitly

To push the changes upstream, run:
    git push origin nn2@v0.0.1

Stripping/replacing all the line breaks means the test would now pass if the CLI command outputs

To push the changes upstream, run:     git push origin nn2@v0.0.1

(which is why I suggested that the long term solution should be that CLI output tests in GTO compare lists of expected lines and not a single expected string)

@tapadipti tapadipti temporarily deployed to pypi October 18, 2023 08:19 — with GitHub Actions Inactive
@tapadipti
Copy link
Contributor Author

@pmrowla I agree with you - it's better not to strip the line breaks. I've made a new commit reverting to an earlier state where I added an explicit line break in the error message to handle the 80 character limit. The suggestion you provided in your comment works for the name a4! but if we change the name to a string of different length at any time, it won't work. So I added a line break after allowed., so that the line does not become >80 chars long in most cases and this issue does not arise.

@shcheklein pls confirm if this is ok.

@tapadipti tapadipti requested a review from pmrowla October 18, 2023 08:35
@pmrowla
Copy link
Contributor

pmrowla commented Oct 18, 2023

The suggestion you provided in #427 (review) works for the name a4! but if we change the name to a string of different length at any time, it won't work.

This is true, but we would hit the issue even with the manual line break if we changed the test to use a longer name.

I would prefer to not have the manual line breaks in the exception message. The line wrapping is handled by the UI rendering library (click for GTO). The real issue here is with the test suite itself - the tests should not be affected by line wrapping in the UI library)


IMO these kinds of fragile UI tests for CLI output are not actually useful at all. It would be better to test that the expected exception is raised by the underlying call.

Basically, GTO tests should only be testing GTO functionality. But right now, the CLI tests are causing issues because in addition to testing GTO functionality they are also testing how click line wraps a given exception message. The click formatting behavior is not actually relevant to whether or not GTO works properly (and click's line wrapping is tested in click itself).

@tapadipti
Copy link
Contributor Author

@pmrowla I tried removing the manual line breaks in the exception message, as you suggested. But it's not working - it looks like the space between Value and must is getting stripped in Windows. See the assertion errors in the test reports for the latest commits.

Summary of error:

When I use your suggested code for the test (in this commit), I get the following mismatch between the actual and expected values.

"\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value\\nmust be of len >= 2 and must start and end with a letter or a number.\\n"
"\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \\r\\nmust be of len >= 2 and must start and end with a letter or a number.\\n"

If I use a \n in the test (see this commit), I get the following mismatch:

"\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value\\nmust be of len >= 2 and must start and end with a letter or a number.\\n"
"\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \\nmust be of len >= 2 and must start and end with a letter or a number.\\n"

From what I saw yesterday, this (stripping) does not happen in ubuntu and macos, so I can't just remove the space from the expected value (else the test will fail in ubuntu and macos). I can treat Windows differently (similar to how I did here), but imo that's a terrible idea. So my only option now is to revert to adding the manual line break in the exception message (or to improve how the tests are written, which I don't want to make part/dependency of this PR).

@dberenbaum
Copy link
Contributor

dberenbaum commented Oct 19, 2023

Hope this helps and isn't further derailing here, but can you strip only the \r clrf windows-style line endings from the actual output? I don't think we are ever going to intentionally add those into output.

Edit: I see there are also issues related to spacing, so I don't think this alone will help much. Disregard.

@pmrowla
Copy link
Contributor

pmrowla commented Oct 19, 2023

We should just remove this CLI test entirely and test the underlying behavior in test_api instead like

with pytest.raises(ValidationError):
    api.deregister(...)  # call deregister with an invalid name

@tapadipti tapadipti temporarily deployed to pypi October 20, 2023 05:49 — with GitHub Actions Inactive
@tapadipti tapadipti mentioned this pull request Oct 20, 2023
@tapadipti
Copy link
Contributor Author

For now, I'm removing the test. I've created an issue to improve tests, and we can add a test for the deprecate model action with invalid model name there. cc @pmrowla @shcheklein @dberenbaum

@tapadipti
Copy link
Contributor Author

@pmrowla Can you ptal at this comment and re-review this PR. I can't merge the PR without you accepting the change.

@tapadipti tapadipti merged commit 96c6d9d into main Oct 26, 2023
17 checks passed
@tapadipti tapadipti deleted the check-args branch October 26, 2023 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

artifacts: Review naming restrictions
6 participants