Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support type selectors in Resolve #2997

Merged
merged 20 commits into from
Feb 4, 2024
Merged

Support type selectors in Resolve #2997

merged 20 commits into from
Feb 4, 2024

Conversation

lefou
Copy link
Member

@lefou lefou commented Feb 1, 2024

This PR add support for new selector pattern _:Type.

In addition to _ and __, which select arbitrary segments, the _:MyType and __:MyType patterns can select modules of the specified type.

The type is matched by it's name and optionally by it's enclosing types and packages, separated by a . sign. Since this is also used to separate target path segments, a type selector segment containing a . needs to be enclosed in parenthesis. A full qualified type can be enforced with the _root_ package.

Example: Find all test jars

> mill resolve __:TestModule.jar
> mill resolve "(__:scalalib.TestModule).jar"
> mill resolve "(__:mill.scalalib.TestModule).jar"
> mill resolve "(__:_root_.mill.scalalib.TestModule).jar"

If a ^ or ! is preceding the type pattern, it only matches segments not an instance of that specified type. Please note that in some shells like bash, you need to mask the ! character.

Example: Find all jars not in test modules

> mill resolve __:^TestModule.jar

You can also provide more than one type pattern, separated with :.

Example: Find all JavaModules which are not ScalaModules or TestModules:

> mill resolve "__:JavaModule:^ScalaModule:^TestModule.jar"

Remarks:

  • Kudos to @lihaoyi who refactored the resolver in More refactoring for Resolve logic #2511 and made this PR possible. I tried to implement it multiple times before, and ever got bitten by the old gnarly resolver code.
  • It's currently not possible to match task/target types. It might be possible, but due to Task being a parametrized type, it might not be as easy to implement and use.

Fix #1550

Pull request: #2997

@lefou lefou changed the title exp resolve type Support a new _:Type pattern in Resolve Feb 1, 2024
@lefou lefou changed the title Support a new _:Type pattern in Resolve Support _:Type and __.Type patterns in Resolve Feb 1, 2024
@lefou lefou changed the title Support _:Type and __.Type patterns in Resolve Support type selectors in Resolve Feb 1, 2024
@lefou lefou marked this pull request as ready for review February 1, 2024 19:14
@lefou lefou requested review from lihaoyi and lolgab February 1, 2024 19:14
@lihaoyi
Copy link
Member

lihaoyi commented Feb 2, 2024

I think this looks great.

The syntax I think is worth discussing. $ would result in accidental variable interpolation if not properly quoted (single quotes only, double quotes won't work), which feels a bit error-prone.

How about a syntax like this?

> mill resolve __:TestModule.jar
> mill resolve __:scalalib.TestModule#jar
> mill resolve __:mill.scalalib.TestModule#jar
> mill resolve __:_root_.mill.scalalib.TestModule#jar

Or in general

> mill resolve my.initial.selector:pkg.and.name.of.ModuleTrait#path.to.target

From my experiments, this seems to not need escaping on sh/bash/zsh, both in scripts and in the CLI. (# seems to require a preceding whitespace to be parsed as a comment)

We could also just say that we require quoting for using this syntax, like we do for the cross-build [] syntax. But even then, using $ is a bit error prone because if you use the wrong kind of quote it can still mess up

@lefou
Copy link
Member Author

lefou commented Feb 2, 2024

I agree $ is a bit unfortunate, due to variable interpolation. I choose it initially, as it is what developer most likely associate with a name separate for class names, e.g. inner classes are encoded that way. Using the dot (.) wasn't an initial option to me, as we already use it as segment separator.

I'd also avoid a dedicated end-marker, especially, as I think this feature is used with simple class names most of the time. And making it optional has a higher change for misuse than the $-case. (Consider __:TestModule.jar works, but introducing a package afterwards no longer works, __:scalalib.Testmodule.jar. If used with a wildcard instead of a concrete target, you might even not notice the misbehavior at all.)

So, I'd either like to find a better separator or require the use of parenthesis in case a separator is needed, in which case we could simple use the dot (.).

  1. Use parenthesis.

This might come with it's own issue in some shells, when used unquoted.

> mill resolve __:TestModule.jar
> mill resolve __:(scalalib.TestModule).jar
  1. Use another separator
> mill resolve __:TestModule.jar             # no separator needed
> mill resolve __:scalalib$TestModule.jar    # current approach, clashes with variable substitution
> mill resolve __:scalalib#TestModule.jar    # might be interpreted as comment
> mill resolve __:scalalib%TestModule.jar    # might look like a vairable on Windows
> mill resolve __:scalalib§TestModule.jar    # pretty uncommon, but as near on the keyboard as the others
> mill resolve __:scalalib+TestModule.jar    # semantically off, but should have no side-effects

@lefou
Copy link
Member Author

lefou commented Feb 2, 2024

I know, that providing multiple options might increase the complexity, but how about supporting both, $ as separator or . as separator when inside parenthesis?

The implementation of the type matcher is already supporting $ and .. And the parser should also be straight forward. It's up to the users preference to deal with the quoting to avoid variable substitution or use the parenthesis. One option might be better than the other, depending on the environment.

> mill resolve __:TestModule.jar
> mill resolve __:scalalib$TestModule.jar
bash> mill resolve __:scalalib\$TestModule.jar
> mill resolve __:(scalalib.TestModule).jar
bash> mill resolve __:\(scalalib.TestModule\).jar
bash> mill resolve "__:(scalalib.TestModule).jar"

@lefou
Copy link
Member Author

lefou commented Feb 2, 2024

The nice thing is, since each module is an valid Scala object, it has a unique type on it's own, so we can easily exclude specific modules from wildcards.

bash> mill __:\(\!integration.slow\).testCached
bash> mill __:\!integration\$slow.testCached

Runs all testCached targets, but not the tests of module integration.slow.

@lihaoyi
Copy link
Member

lihaoyi commented Feb 2, 2024

(1) I think we definitely should use . for the package selector for the type. $, #, %, +, etc. are not what people use to select nested packages in any common language. It's easy to parse and technically unambiguous, but to anyone from Java/JS/Python/C/Scala/etc. backgrounds it looks totally strange. Sure, JVM-encoded names often have $s in them, but most people don't read/write JVM-encoded names on a regular basis.

(2) Looking at your examples, I think we 100% need to suggest people use quotes. Given constraint (1) above, there simply aren't any intuitive symbols we can use to separate the type selector from the task selector, and using backslashes everywhere to escape things quickly turns to symbol soup that is impossible to skim

(3) Given (1) and (2), If we're going to be quoting things, then we might as well make it look like "normal" syntax. So something like this might make the most sense:

"(__: scalalib.TestModule).jar"

This would at least be familiar(ish) to people who are used to Scala, where (_: Foo).bar is valid (albeit uncommon) syntax, and even has the correct meaning "refer to member .bar on things of type Foo". It's a bit more verbose than the other proposed syntaxes above, but I think we're already at the limit of how complex things we can represent in unquoted shell tokens, and trying to push it further would not be a worthwhile endeavor.

@lefou
Copy link
Member Author

lefou commented Feb 2, 2024

I agree with the . as package separator. I already implemented the parenthesis approach, and it works nicely.

In fact, I borrowed the _: Type syntax from Scala.

There is one ugly fact with quoting the !, at least in bash which I use. It also the reason I kept using the backslash-quoting in my examples: Although the ! can be quoted with a backslash, it can't when inside of quotes. I couldn't use the ! once I used quotes around it.

bash> mill resolve "__:(_root_.mill.scalalib.TestModule):(!my.test).test"
bash: !my.test: event not found

bash> mill resolve "__:(_root_.mill.scalalib.TestModule):(\!my.test).test"
[1/1] resolve 
1 targets failed
resolve Parsing exception Position 1:37, found ":(\\!my.te"

bash> mill resolve "__:(_root_.mill.scalalib.TestModule):(\\!my.test).test"
[1/1] resolve 
1 targets failed
resolve Parsing exception Position 1:37, found ":(\\!my.te"

We could support some alternative though, like the ^ known from regex. I bet, it has a special meaning in bash, too.

So, my takeaway is, you would rather like the wildcard inside the parenthesis like (_: my.Type) instead of _: (my.Type). Let me just change that.

@lefou
Copy link
Member Author

lefou commented Feb 2, 2024

@lihaoyi I changed the parenthesis handling and added the ^ as alternative negation operator. I update the PR description. Let me know what you think!

@lihaoyi
Copy link
Member

lihaoyi commented Feb 3, 2024

I think it looks great.

I think we might not want to support ! at all, for similar reasons as not supporting $: ! not only needs to be quoted, but it needs to be quoted using the correct kind of quotes!

bash-3.2$ echo "hello!world"
bash: !world": event not found

bash-3.2$ echo 'hello!world'
hello!world

I think as a general rule for the CLI, we probably want to use syntax that works the same when quoted in either single or double quotes. Otherwise you start having to worry about using the right set of ' quotes rather than the just as common ", and then when you want to interpolate things you need to figure out you need to do 'hello '"$interpolated"' world' to do the interpolation rather than "hello $interpolated world", and in general it becomes very messy.

@lefou
Copy link
Member Author

lefou commented Feb 4, 2024

Let's promote the ^ character then. I would still keep support for !. I find it more natural and easier to remember. There are also users of other shells not affected by the histexpand issue and those who really understand their tools, that might welcome the alternative.

@lefou lefou merged commit 8d315d7 into main Feb 4, 2024
38 checks passed
@lefou lefou deleted the exp-resolve-type branch February 4, 2024 13:17
@lefou lefou added this to the 0.11.7 milestone Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for excluding targets from wildcard selection
2 participants