Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/update-report: improve formula file validation #18167

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

apainintheneck
Copy link
Contributor

@apainintheneck apainintheneck commented Aug 26, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Currently, ruby files that are not in the Cask directory are considered to be formulae if a Formula or HomebrewFormula directory doesn't exist which doesn't make sense. We know that these should only be in a few directories so we can check for that explicitly.

Beyond that the Tap#cask_file? and Tap.formula_file? methods were only used inside update-report so it doesn't make sense to turn them into pathnames and expand things when we know that each string will be a relative path from a tap that we can just check with a regex.

This change will stop other tap changes like new commands or changes to other directories like lib/ from showing up as new formulae.

I tried opening a PR for this a long time ago but I got busy with other things and it got closed by the stale bot.

Currently, ruby files that are not in the Cask directory are
considered to be formulae if a Formula or HomebrewFormula directory
doesn't exist  which doesn't make sense. We know that these should
only be in a few directories so we can check for that explicitly.

Beyond that the `Tap#cask_file?` and `Tap.formula_file?` methods
were only used inside update-report so it doesn't make sense to
turn them into pathnames and expand things when we know that
each string will be a relative path from a tap that we can just
check with a regex.

This change will stop other tap changes like new commands or
changes to other directories like lib/ from showing up as new formulae.

I tried opening a PR for this a long time ago but I got busy
with other things and it got closed by the stale bot.

- #15489
@apainintheneck apainintheneck marked this pull request as draft August 26, 2024 01:34
@apainintheneck
Copy link
Contributor Author

Here's an example of the unexpected behavior.

/u/l/Homebrew (master|✔) $ brew update
==> Updating Homebrew...
Updated 1 tap (apainintheneck/dev-utils).
==> New Formulae
apainintheneck/dev-utils/startup-stats
==> Outdated Formulae
gh                                   luajit                               python-setuptools                    shellcheck

You have 4 outdated formulae installed.
You can upgrade them with brew upgrade
or list them with brew outdated.

In this case, startup-stats is a new command not a new formula.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I do think the regex-based approach is a little harder to read so may e.g. warrant some interpolatated variables and comments.

Library/Homebrew/tap.rb Outdated Show resolved Hide resolved
@apainintheneck
Copy link
Contributor Author

I could potentially try to use File.fnmatch? instead of regexes if that would be clearer.

Comment on lines 759 to 764
when path/"Formula"
%r{^Formula(/[^/]+)+\.rb$}
when path/"HomebrewFormula"
%r{^HomebrewFormula(/[^/]+)+\.rb$}
when path
%r{^[^/]+\.rb$}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
when path/"Formula"
%r{^Formula(/[^/]+)+\.rb$}
when path/"HomebrewFormula"
%r{^HomebrewFormula(/[^/]+)+\.rb$}
when path
%r{^[^/]+\.rb$}
formula_file_regex = "[^/]+\.rb
file_file_and_dir_regex = "(/[^/])*#{formula_file_regex}"
when path/"Formula"
%r{^Formula#{file_file_and_dir_regex}$}
when path/"HomebrewFormula"
%r{^HomebrewFormula#{file_file_and_dir_regex}$}
when path
%r{^#{formula_file_regex}$}

which possibly doesn't work at all but something like this that DRYs up the regex a bit and explains what's going on by breaking into components

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of interpolating regexes based on smaller regexes especially when these regexes are already quite small because it ends up making things harder to read. Maybe adding a comment explaining each regex would be more helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the regexes as is pretty hard to follow. I don't think everything should always be DRY but, in this case, it is important that these three regexes have the same trailing end and only really differ by prefix. If you can find another way of achieving the same thing with better performance, cool, but, if not: I'd really like to split this into variables so that these don't run the risk of getting out of sync over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have already been added to prevent this from happening so I don't think we need to worry about things getting out of sync here. IMO the biggest issue here is readability. Maybe something like this would be easier to read. It splits out the directory part of the regex into a separate section from the file name part and adds some inline comments. I haven't tested it yet but it should be close enough.

%r{
  ^Formula/
  (?:[^/]+/)* # zero or more subdirectories
  [^/]+\.rb   # ruby file name
}x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have already been added to prevent this from happening

Tests don't prevent these getting out of sync, they just prevent them from getting out of sync in a way that breaks those specific tests.

IMO the biggest issue here is readability.

I somewhat agree. I think the biggest issue is that you have three non-variable-assigned regexes and ⅔ of them differ only by prefix and they all have the same suffix. This harms both readability (it's hard to mentally parse where they are the same/not) but also: this is literally shared (and crucial!) logic that must remain consistent (or things will go 💥).

This is an almost textbook case of when you want to make things DRY. There's no world in which you want them to go out of sync.

Maybe something like this would be easier to read.

Do you propose to do these same comments for each of the regexes? This really seems like what variables are for...

It splits out the directory part of the regex into a separate section from the file name part

Or: you could do this with variables? Please?

Copy link
Contributor Author

@apainintheneck apainintheneck Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you feel very strongly about this point so I'll update it. In the future, if you feel very strongly about something to the point that it's a blocker, could you use Changes Requested to better communicate that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 60b8878

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you feel very strongly about this point so I'll update it.

Thanks.

In the future, if you feel very strongly about something to the point that it's a blocker, could you use Changes Requested to better communicate that?

You were one of the folks who asked that the use of PR approvals be documented so they were: https://github.com/Homebrew/brew/blob/master/docs/Maintainer-Guidelines.md#reviewing-prs

What you're asking for here contradicts the documentation here so I'd rather stick to what the documentation says until someone (could be you) submits a PR to update the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just like more clarity when it comes to which changes are nits/ignorable, suggestions/optional and blocking/required. It doesn't necessarily have to be expressed as Changes Requested but that's the most straightforward way to express that something is blocking/required.

The goal here is to make things easier to read by better
describing how different regexes are similar to each other.
@apainintheneck apainintheneck force-pushed the fix-cask-formula-file-validation-v2 branch from effe34c to 60b8878 Compare August 29, 2024 02:12
@apainintheneck apainintheneck marked this pull request as ready for review August 29, 2024 03:04
@apainintheneck
Copy link
Contributor Author

This is now ready for final review.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the regex changes! Looks good to merge now.

@MikeMcQuaid MikeMcQuaid merged commit 638a3dc into master Aug 29, 2024
27 checks passed
@MikeMcQuaid MikeMcQuaid deleted the fix-cask-formula-file-validation-v2 branch August 29, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants