-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pattern matching API to OsStr #109350
base: master
Are you sure you want to change the base?
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
@rustbot label +T-libs-api |
This comment has been minimized.
This comment has been minimized.
@QuineDot I infer your intent in this comment is to signal that this PR depends on that PR and the T-libs ACP, right? I'll try to signal this with a: @rustbot label s-waiting-on-acp -s-waiting-on-review |
Unfortunately I don’t have Windows machine to run the tests so bare with me on those ones. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
No idea what the deal with this one is. |
This comment has been minimized.
This comment has been minimized.
I'm also interested in this functionality (per the ACP linked above) so I tried to look into the build failure to help, but the size of this PR is much larger than I expected. It might be easier to debug the build failure if some of the deep refactoring could be separated out. I've put the proposed implementation for my ACP into a PR at #111059, and with helper functions + docs + tests it's about 550 lines (~10% of this PR's net new LoC). It might be a good intermediate point, unblocking cross-platform flag parsing while the full details of how to reverse-split a |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Firstly, combine functions and results lists into a single list with ‘function => result’ pairs. This makes it easier to match function with its result. Secondly, eliminate InRange step so that it’s easier to notice series of matches or rejects.
Introduce a new core::str_bytes module with types and functions which handle string-like bytes slices. String-like means that they code treats UTF-8 byte sequences as characters within such slices but doesn’t assume that the slices are well-formed. A `str` is trivially a bytes sequence that the module can handle but so is OsStr (which is WTF-8 on Windows and unstructured bytes on Unix). Move bunch of code (most notably implementation of the two-way string-matching algorithm) from core::str to core::str_bytes. Note that this likely introduces regression in some of the str function performance (since the new code cannot assume well-formed UTF-8). This is going to be rectified by following commit which will make it again possible for the code to assume bytes format. This is not done in this commit to keep it smaller.
Since core::str_bytes module cannot assume byte slices it deals with are well-formed UTF-8 (or even WTF-8), the code must be defensive and accept invalid sequences. This eliminates optimisations which would be otherwise possible. Introduce a `Flavour` trait which tags `Bytes` type with information about the byte sequence. For example, if a `Bytes` object is created from `&str` it’s tagged with `Utf8` flavour which gives the code freedom to assume data is well-formed UTF-8. This brings back all the optimisations removed in previous commit.
I’m honestly not entirely sure why this is needed, but if I try to edit the file in subsequent commit without this change I’m getting ‘missing stability attribute’ errors: error: struct has missing stability attribute --> library/std/src/sys/unix/os_str.rs:22:1 | 22 | / pub struct Buf { 23 | | pub inner: Vec<u8>, 24 | | } | |_^
Implement Haystack for &OsStr and Pattern<&OsStr> for &str, char and Predicate. Furthermore, add prefix/suffix matching/stripping and splitting methods to OsStr type to make use of those patterns. Using OsStr as a pattern is *not* implemented. Neither is indexing into OsStr. All matching and indexing has to be done via provided functions.
To work around orphan rules, introduce a wrapper type for predicate functions to be used as pattern. Specefically, if we want to add predicat pattern implementation for OsStr type, doing it with a naked `FnMut` results in compile-time errors: error[E0210]: type parameter `F` must be covered by another type when it appears before the first local type (`OsStr`) impl<'hs, F: FnMut(char) -> bool> core::pattern::Pattern<&'hs OsStr> for F { ^ type parameter `F` must be covered by another type when it appears before the first local type (`OsStr`)
Due to technical limitations adding support for predicate as patterns on OsStr slices must be done via core::pattern::Predicate wrapper type. This isn’t ideal but for the time being it’s the best option I’ve came up with. The core of the issue (as I understand it) is that FnMut is a foreign type in std crate where OsStr is defined. Using predicate as a pattern on OsStr is the final piece which now allows parsing command line arguments.
The job Click to see the possible cause of the failure (guessed by this bot)
|
@mina86 any updates on the CI failure? |
Sorry, no, didn’t really have much time to look into it closely. Having said that, this is ready for high-level review even with failing tests. I would rather not spend time fixing tests if the end decision is that this is a completely wrong approach. |
☔ The latest upstream changes (presumably #113491) made this pull request unmergeable. Please resolve the merge conflicts. |
I have nothing useful to add here, but need to say I love the kernel-style per-commit-reviewable patchset with actually useful commit messages. This repo needs more of that! |
@mina86 any updates on this? |
I’m waiting for a reviewer to take a look at this before I commit more time to fixing all the CI failures. |
Thank you for your work on this! I was wondering why we never did this and then I found your PR! In hopes that this will help move this along, I've created an ACP for this (rust-lang/libs-team#311). |
@mina86 libs-api has gotten back about what is needed for moving this forward, see rust-lang/libs-team#311 |
r? libs-api |
cc @pitaj, @kennytm
This is a sizeable patchset so when reviewing looking at individual commits (rather than the whole changeset) is advisable.
The motivation for this PR is parsing command line arguments. It adds {starts,ends}_with, strip_{prefix,suffix}, {,r}split_once and split methods to OsStr supporting char, &str and FnMut(char) -> bool patterns. (Other methods can be easily added once general consensus for this PR is reached).
Note that this PR doesn’t implement RFC 2295 and doesn’t allow OsStr to be a pattern. This is done because:
This PR also sort of implements the new Pattern API. As I understand it’s no longer a thing, but I’ve decided to keep the change in because it does allow common interface and code sharing. (Though I have some doubts about the actual interface; for example I question existence of Searcher::next method). Keep in mind this is just a means to an end so if messing about with core::str::pattern would be a blocker I can undo those changes.
The core idea with this PR is introduction of core::str_bytes::Bytes type which handles byte slices which are possibly invalid UTF-8. str and OsStr are kind of Bytes. With that, pattern matching has to be implemented only once for Bytes type so that the same matching code doesn’t have to be duplicated for str and OsStr. Bytes can have Flavours (UTF-8, WTF-8 or unstructured) which allow implementing optimisations based on str being valid UTF-8 or OsStr on Windows being valid WTF-8.
Commits present in this PR:
PS. FYI, I have commits which converts core::slice::SlicePattern to the Pattern API implemented in this PR. Those aren’t included here since they would just distract from the actual changes but if someone is interested they can be found in pattern branch on my fork of the repository.
ACP: rust-lang/libs-team#311