Skip to content

Conversation

@notriddle
Copy link
Contributor

@notriddle notriddle commented Jul 10, 2024

Heads up!: This PR is a follow-up that depends on #124544. It adds 12dc24f, a change to the filtering behavior, and 9900ea4, a minor ranking tweak.

Part of rust-lang/rust-project-goals#112

This PR overturns #109802

Preview

image

Description

This commit is a response to feedback on the displayed type signatures results, by making generics act stricter.

  • Order within generics is significant. This means Vec<Allocator> now matches only with a true vector of allocators, instead of matching the second type param. It also makes unboxing within generics stricter, so Result<A, B> only matches if B is in the error type and A is in the success type. The top level of the function search is unaffected.
  • Generics are only "unboxed" if a type is explicitly opted into it. References and tuples are hardcoded to allow unboxing, and Box, Rc, Arc, Option, Result, and Future are opted in with an unstable attribute. Search result unboxing is the process that allows you to search for i32 -> str and get back a function with the type signature &Future<i32> -> Box<str>.
  • Instead of ranking by set overlap, it ranks by the number of items in the type signature. This makes it easier to find single type signatures like transmute.

Find the discussion on

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jul 10, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 10, 2024

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

Some changes occurred in GUI tests.

cc @GuillaumeGomez

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Jul 27, 2024

☔ The latest upstream changes (presumably #128253) made this pull request unmergeable. Please resolve the merge conflicts.

@notriddle notriddle force-pushed the notriddle/search-sem-3 branch from 759df53 to 29d228b Compare July 27, 2024 04:34
@camelid
Copy link
Member

camelid commented Jul 31, 2024

Generics are tightened by making order significant. This means Vec<Allocator> now matches only with a true vector of allocators, instead of matching the second type param. It also makes unboxing within generics stricter, so Result<A, B> only matches if B is in the error type and A is in the success type. The top level of the function search is unaffected.

Excellent, this is great—thanks! Although do you mind clarifying what you mean by "The top level of the function search is unaffected"? Could you give an example?

Type params are loosened by allowing a single function type param to match multiple query params. In other words, the relationship is loosened to N:1 instead of the older 1:1 rule. This change also allows a type param to be unboxed in one spot and matched in another.

Hmm, this seems like the opposite of what we discussed in the meeting. The goal is to have a 1:1 matching, not N:1, so (using your example) Box<[A]> -> Vec<B> would return nothing because the query asks for something more powerful (transforming any A into a potentially different B). However, Box<[A]> -> Vec<A> would work because the function it describes does not transform the inner element. Functions like transmute would be the only ones matched by a query like A -> B. This is what @fmease described in his Zulip post that you linked to:

I would've expected Box<[T]> -> Vec (etc.) to return no results at all because there is no such signature. Only Box<[T]> → Vec would match.

The older rule seems to be N:1 too based on checking the nightly std docs, so I'm not sure what change this PR makes. Please correct me if I misunderstood.

One more thing: What does it mean for a generic parameter to be "unboxed" as compared to being "matched"?

@notriddle
Copy link
Contributor Author

notriddle commented Jul 31, 2024

Excellent, this is great—thanks! Although do you mind clarifying what you mean by "The top level of the function search is unaffected"? Could you give an example?

The actual function parameters can be given in any order. vec<t>, usize -> t gives you remove, swap_remove, split_off, and drain. usize, vec<t> -> t gives the exact same set of results.

The stuff within the <> is required to be in a the same order, but the function parameters themselves are order-agnostic.

Hmm, this seems like the opposite of what we discussed in the meeting. The goal is to have a 1:1 matching, not N:1, so (using your example) Box<[A]> -> Vec<B> would return nothing because the query asks for something more powerful (transforming any A into a potentially different B).

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

The older rule seems to be N:1 too based on checking the nightly std docs, so I'm not sure what change this PR makes. Please correct me if I misunderstood.

The older rule was 1:1. I suspect that almost all of the confusion about rustdoc's type-driven search comes from reordering, which is why I'm happy to remove it.

One more thing: What does it mean for a generic parameter to be "unboxed" as compared to being "matched"?

Given a function with this signature: fn run<I: Iterator<Item=u32>>(i: I) -> bool

I can search for it with this query: iterator<item=u32> -> bool

That is, the generic parameter I has been "unboxed" into its where clause.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

The actual function parameters can be given in any order. vec<t>, usize -> t gives you remove, swap_remove, split_off, and drain. usize, vec<t> -> t gives the exact same set of results.

The stuff within the <> is required to be in a the same order, but the function parameters themselves are order-agnostic.

Sounds good 👍

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

Hmm, seems like we'll need to have more team discussion on this point then. It sounds like fmease and I are in favor of the stricter "T and Z are parametrically polymorphic" approach (like in Rust generics, where we must consider each to be instantiated any way the user likes), whereas jsha prefers the looser "T and Z are just placeholders" approach. I see advantages both ways, but I prefer the former because it matches the language itself better, and it also hews more closely to prior art (like Hoogle).

However, it seems that implementing the former approach properly might require rethinking some aspects of the search design and implementation. Namely, it seems that currently A is considered a placeholder for a type and may match a concrete type rather than just a parameter. I expected type-based search to behave more like Hoogle (note: the following links are to equivalent Hoogle searches), where Vec<T> -> A matches nothing (except transmute, or in Hoogle's case also a version of len that is generic over the number), Vec<T> -> T matches things like get but not len, and Vec<T> -> usize matches len.

This may all be too difficult to implement especially given the constraints of using browser JS. Or we may decide that we want a looser behavior in fact. But I wanted to lay this out to make sure we're explicit about the decision.

The older rule was 1:1. I suspect that almost all of the confusion about rustdoc's type-driven search comes from reordering, which is why I'm happy to remove it.

That is probably true, and I agree that removing reordering is the most important change. Figuring out how exactly generics are interpreted is a bigger discussion that may be best handled in a future PR.

Given a function with this signature: fn run<I: Iterator<Item=u32>>(i: I) -> bool

I can search for it with this query: iterator<item=u32> -> bool

That is, the generic parameter I has been "unboxed" into its where clause.

Thanks 👍

@camelid
Copy link
Member

camelid commented Jul 31, 2024

Btw there does seem to be a bug in the implementation of this PR since A is actually matching VecDeque<T, A>.

image

We may also consider hiding parameters that have defaults (like Vec's A) that were not matched in the query.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

After posting my reply, I went to read @jsha's original comment, and it made me wonder if what actually bothered him was reordering. I suspect that after reordering is removed, it behaves fine in general and with defaulted type parameters. IMO, requiring them to match different types is also not unusual since it more closely matches (though definitely not completely) the normal semantics of generics in Rust. Allowing T and Z to match the same type makes them behave more like existentially-quantified types (i.e., dyn Trait) than universally-quantified types (i.e., fn foo<A>(x: A)).

One last suggestion: What if we removed the feature of requiring T and Z to match distinct types? I get that it's more expressive, but it's unusual, and it interacts poorly with other features like reordering and defaulted type parameters (like Vec's Allocator parameter, which people don't often think about).

@notriddle
Copy link
Contributor Author

Namely, it seems that currently A is considered a placeholder for a type and may match a concrete type rather than just a parameter.

It only matches parameters. For example, Arc<File> -> bool has two results, while Arc<File> -> T has none.

Btw there does seem to be a bug in the implementation of this PR since A is actually matching VecDeque<T, A>.

As it says in the where clause, "A matches T", the VecDequeue's first type parameter, not VecDequeue itself. That's also why the word "VecDequeue" is not bold.

@notriddle
Copy link
Contributor Author

Figuring out how exactly generics are interpreted is a bigger discussion that may be best handled in a future PR.

We're going to need to decide this at some point. This PR is as good a point as any, provided the other PR, adding the user-visible type signatures to the search results, isn't blocked.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

It only matches parameters. For example, Arc<File> -> bool has two results, while Arc<File> -> T has none.

Ah, I think I was confused because A -> B has a lot of results that are not just transmute. This is probably because we encode Self as a generic rather than its own thing. I could open a PR to fix that if you think that'd make sense?

As it says in the where clause, "A matches T", the VecDequeue's first type parameter, not VecDequeue itself. That's also why the word "VecDequeue" is not bold.

So when we search, it can match something inside another type? That seems a bit counterintuitive to me and like it would generate excess results. Was this added so that T would show results with Option<T>? If so perhaps it might make more sense to just special-case a few types like Option and Result.

@notriddle
Copy link
Contributor Author

So when we search, it can match something inside another type? That seems a bit counterintuitive to me and like it would generate excess results. Was this added so that T would show results with Option<T>? If so perhaps it might make more sense to just special-case a few types like Option and Result.

You're basically correct about the reasoning, yes.

Hoogle mentioned it in their design docs. https://ndmitchell.com/downloads/slides-hoogle_finding_functions_from_types-16_may_2011.pdf "A note on 'boxing'". For example, Num i => [a] -> i includes a result with a Maybe return value.

However, it looks like they only do it with special-cased types? https://github.com/ndmitchell/hoogle/blob/8149c93c40a542bf8f098047e1acbc347fc9f4e6/docs/TypeSearch.md#rewrites describes a special rule, and when I try to test it, I can only see boxing with Maybe and IO.

I didn't special-case anything because it's not clear which things should be special-cased for Rust.

Ah, I think I was confused because A -> B has a lot of results that are not just transmute. This is probably because we encode Self as a generic rather than its own thing. I could open a PR to fix that if you think that'd make sense?

Yup, that's bad. The blanket impl is being translated wrong.

@jsha
Copy link
Contributor

jsha commented Aug 12, 2024

To clarify my previous comment:

  • If A shows up in a search, it should match the same thing each time. For instance, Vec<A> -> Option<A> should find Vec::first but not (e.g.) Vec::as_ascii.
  • If A and B show up in a search, they should match for both A == B and A != B. For instance, Vec<A> -> Option<B> should find Vec::first and Vec::as_ascii. My understanding is that the the current logic requires A != B.

After posting my reply, I went to read @jsha's original comment, and it made me wonder if what actually bothered him was reordering. I suspect that after reordering is removed, it behaves fine in general and with defaulted type parameters.

Maybe after removing reordering, the A != B requirement doesn't wind up being that important either way. My main motivation here is to look for ways to keep the search rules simple enough for people to understand and learn easily. And also simple enough to implement correctly and keep correct even as we add features).

IMO, requiring them to match different types is also not unusual since it more closely matches (though definitely not completely) the normal semantics of generics in Rust.

I don't understand this. It seems to me the opposite is true. I wrote previously:

if I wrote struct Point<T, U>, it would be valid for T and U to be the same type.

So for instance let x: Point<u32, u32> = ... would be valid.

@notriddle notriddle force-pushed the notriddle/search-sem-3 branch from 4946c07 to bbdd6e8 Compare August 13, 2024 23:22
@rust-log-analyzer

This comment has been minimized.

@notriddle notriddle force-pushed the notriddle/search-sem-3 branch from bbdd6e8 to eb5487e Compare August 13, 2024 23:34
@notriddle
Copy link
Contributor Author

I've added the unboxing flag feature in another commit (and also rebased onto head). It now does unboxing on Box, &, Future, Option, Result, and (Tuple), but not anything else.

As for the A != B constraint, the reason I originally wrote it that way was the broad idea that you aren't required to specify generics at all, so if you are specifying them, you probably have some specific meaning in mind.

  • If A shows up in a search, it should match the same thing each time. For instance, Vec<A> -> Option<A> should find Vec::first but not (e.g.) Vec::as_ascii.
  • If A and B show up in a search, they should match for both A == B and A != B. For instance, Vec<A> -> Option<B> should find Vec::first and Vec::as_ascii. My understanding is that the the current logic requires A != B.

It sounds like you also want type params in the search query to match with concrete types in the function, and not just with the function's own type params.

I'm not sure that's very useful. First of all, it's not how Hoogle works. If I run a search ParsecT [Tok] s m Tok, I get back pEscaped. If I turn one of the Tok into t (which turns it into a type param), making ParsecT [Tok] s m t, I get no results.

Second of all, the use case I have in mind is that someone's searching for a function because they plan to call it, so the stuff before the arrow is what they have, while the stuff after the arrow is what they want to get. If they're writing a type param, it's probably because they are themselves authoring a generic function.

@bors
Copy link
Collaborator

bors commented Aug 31, 2024

☔ The latest upstream changes (presumably #129809) made this pull request unmergeable. Please resolve the merge conflicts.

@rfcbot rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Nov 11, 2024
@rfcbot
Copy link

rfcbot commented Nov 11, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@GuillaumeGomez
Copy link
Member

Thanks everyone!

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 11, 2024

📌 Commit 9900ea4 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 11, 2024
@bors
Copy link
Collaborator

bors commented Nov 11, 2024

⌛ Testing commit 9900ea4 with merge d4822c2...

@bors
Copy link
Collaborator

bors commented Nov 11, 2024

☀️ Test successful - checks-actions
Approved by: GuillaumeGomez
Pushing d4822c2 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2024
@bors bors merged commit d4822c2 into rust-lang:master Nov 11, 2024
7 checks passed
@rustbot rustbot added this to the 1.84.0 milestone Nov 11, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d4822c2): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.5%] 15
Regressions ❌
(secondary)
0.4% [0.1%, 0.5%] 19
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) 0.2% [0.1%, 0.5%] 15

Max RSS (memory usage)

Results (secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [0.8%, 4.7%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.7% [-3.7%, -0.5%] 6
All ❌✅ (primary) - - 0

Cycles

Results (secondary 1.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.7% [2.6%, 2.9%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 787.302s -> 785.736s (-0.20%)
Artifact size: 335.27 MiB -> 335.37 MiB (0.03%)

@rustbot rustbot added the perf-regression Performance regression. label Nov 11, 2024
@notriddle notriddle deleted the notriddle/search-sem-3 branch November 12, 2024 16:41
@notriddle
Copy link
Contributor Author

This PR adds the name of the generic type parameter to the search index. That information takes time to gather and serialize.

@rustbot label: +perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.