-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: Pattern methods for OsStr
without OsStr
patterns
#311
Comments
This seems like a good idea to me. Putting these methods on OsStr will allow code to do simple parsing/splitting/etc in safe code. And because this does not add OsStr to Pattern, it remains a simple addition to API surface without any representation changes. |
What implements |
I've updated the proposal to call that out impl Pattern<&OsStr> for &str {}
impl Pattern<&OsStr> for char {}
impl Pattern<&OsStr> for &[char] {}
impl<F: FnMut(char) -> bool> Pattern<&OsStr> for F {}
impl Pattern<&OsStr> for &&str {}
impl Pattern<&OsStr> for &String {} Basically, this is a direct mirror of |
IIRC, this is currently forbidden due to coherence rules. |
@pitaj Presumably we'd do this the same way we already did for Pattern. |
AFAIK all of today's |
@pitaj Ah, I see. I think we have a mechanism that allows us to have impls of core traits in std without regard for the orphan rule, which would address that. |
Last I checked, there's a mechanism for inherent impls, but not for traits. That said, it's probably something that could be added. |
we could put just the
|
We briefly discussed this in last week's libs-api meeting. While we agree it'd be good for OsStr to have a more complete API, we're worried about the amount of string types: should CStr and It'd be good to first explore solutions that could benefit all string types, before continuing with this proposal to extend OsStr itself. For example, could a trait or another mechanism be used to make this api availabel to all string types? |
So if I understand correctly, the desire is to explore the design of a Pattern Extension trait with methods like Including |
i think you meant |
Good point, and figuring out how we should handle For example, |
maybe |
Should function arguments be full-blown But mainly: Should Should |
wouldn't |
I like the trait idea, especially since it will allow writing combinators that work for any string type and being able to reuse them on both Maybe a trait could look something like this (ignoring lifetimes): trait SliceLikePattern: ToOwned {
// Yes, we don't have associated type defaults...
/// Result of splitting items
type Partial = Self;
/// Rightmost result of split items if different than `Partial`, e.g. for `CStr`
type PartialRight = Self::Partial;
/// Pattern type used when a single element will be extracted. `u8` for `&[u8]`,
/// `str::pattern::Pattern` for str, maybe `u8` or `u16` for `OsStr`
/// Or maybe `FnMut(&u8) -> bool` for slices, as in `split_once`
type ItemPattern;
/// Pattern type used when a partial (slice) is expected, `&[u8]` for `&[u8]`
/// still `str::pattern::Pattern` for `str`
type PartialPattern;
/// PartialPattern but if there is a specific right-first search
/// e.g. str's `<P as Pattern<'a>>::Searcher: ReverseSearcher<'a>`
type PartialPatternReverse = Self::PartialPattern;
fn split_at(&self, mid: usize) -> (&Self::Partial, &Self::PartialRight);
fn split_at_mut(&self, mid: usize) -> (&mut Self::Partial, &mut Self::PartialRight);
fn contains<P: Self::ItemPattern>(&self, pat: P) -> bool;
fn starts_with<P: Self::PartialPattern>(&self, pat: P) -> bool;
fn ends_with<P: Self::PartialPatternReverse>(&self, pat: P) -> bool;
fn find<P: Self::PartialPattern>(&self, pat: P) -> Option<usize>;
fn rfind<P: Self::PartialPatternReverse>(&self, pat: P) -> Option<usize>;
fn split<P: Self::PartialPattern>(&self, pat: P) -> Split<P>;
// ... similar variants of iterating splits and matches
fn split_once<P: Self::ItemPattern>(&self, pat: P) -> Option<(&Self::Partial, &Self::PartialRight)>;
fn rsplit_once<P: Self::ItemPatternReverse>(&self, pat: P) -> Option<(&Self::Partial, &Self::PartialRight)>;
// I don't think we can do simple `trim_{start, end}` here or anything else that
// relies on whitespace knowledge
fn trim_start_matches<P: Self::PartialPattern>(&self, pat: P) -> &Self::PartialRight;
fn trim_end_matches<P: Self::PartialPatternReverse>(&self, pat: P) -> &Self::Partial;
fn strip_prefix<P: Self::PartialPattern>(&self, pat: P) -> Option<&Self::PartialRight>;
fn strip_suffix<P: Self::PartialPatternReverse>(&self, pat: P) -> Option<&Self::Partial>;
fn replace<P: Self::PartialPattern>(&'a self, from: P, to: &Self::PartialRight) -> <Self as ToOwned>::ToOwned;
fn repeat<P: Self::PartialPattern>(&'a self, from: P, repeat: usize) -> <Self as ToOwned>::ToOwned;
} There probably isn't anything that restricts this to string-like types, I could see a lot of this being beneficial to let this apply to anything. |
We discussed this one again in today's @rust-lang/libs-api meeting, in light of #499. We'd like to accept this, using the same API proposal as #499: use the same Sorry that this has been such a long and storied road getting to this point. |
@joshtriplett from my understanding of #499, we'd have functionality like trait ByteSlice {
fn split_bytes(&self, pat: impl BytePattern) -> BytesSplit<'_, P>;
}
trait BytePattern { ... }
impl BytePattern for &[u8] { ... }
// `BytesSplit<'_, P>` would be an `Iterator<&[u8]>` If that is correct, that misses the key goal of this ACP: use of Use of a If we limit |
No, what is being proposed is that we have inherent methods on |
I don't think it's sound to accept arbitrary byte patterns for Windows/WTF-8 OS strings. Consider: let crab = std::ffi::OsStr::new("🦀");
assert_eq!(crab.as_encoded_bytes(), b"\xF0\x9F\xA6\x80");
let (head, tail) = crab.split_once(b"\x9F").unwrap();
|
|
In that case, I assume we'd have slicing logic like #306. |
@epage Thanks for drawing attention to this; I think we missed that. Clarifying the issue here: the set of methods isn't a problem here, it's just a question of what types they accept and return, right? It makes sense that BytePattern may be too general here. |
Agreed Some options for
We could also go back to the original proposal and use |
I definitely don't think we'll want to mark the functions Of the various solutions, which path would you recommend, balancing simplicity, usability, and consistency with #499? |
Could #311 (comment) (or a simplified version of it) work? This would provide methods similar to those from |
I would propose making If you consider I think In both cases, we could always add EDIT: We could design a new interface for |
@epage Got it! So, use |
Yes, to restrict the patterns/needles to UTF-8 while the haystack can be anything. Non-UTF-8 content won't match. There might be some details with that to work out but i figure thats what the unstable period will be for. |
We discussed this in today's @rust-lang/libs-api meeting. We realized that using We considered three alternatives: Option A: use Pattern and call the function with a Unicode replacement character for non-UTF-8 (using the logic from https://doc.rust-lang.org/std/str/struct.Utf8Error.html for how many bytes to treat as the invalid character) Option B: Option C: A sealed We decided on option C for the initial experiment, and we can evaluate whether this was the right choice at stabilization time. |
For the record, the |
Proposal
Problem statement
With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on
OsStr
but it requiresunsafe
to get anOsStr
back, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.
Motivating examples or use cases
Mostly copied from #306
Argument parsers need to extract substrings from command line arguments. For example,
--option=somefilename
needs to be split into option andsomefilename
, and the original filename must be preserved without sanitizing it.clap
currently implementsstrip_prefix
andsplit_once
using transmute (equivalent to the stableencoded_bytes
APIs).The
os_str_bytes
andosstrtools
crates provides high-level string operations for OS strings.os_str_bytes
is in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3).osstrtools
enables reasonable uses ofsplit()
to parse $PATH andreplace()
to fill in command line templates.Solution sketch
Provide
str
sPattern
-accepting methods on&OsStr
.Defer out
OsStr
being used as aPattern
andOsStr
indexing support which are specified in RFC #2295.Example of methods to be added:
str
and if there are any changes between the writing of this ACP and implementation, the focus should be on whatstr
has at the time of implementation (e.g. not adding a deprecated variant but the new one)trim
,trim_start
, andtrim_end
to be consistent withtrim_start_matches
/trim_end_matches
This should work because
OsStr
bytes rust#109698 already established that operations on UTF-8 / 7-bit ASCII boundaries are safePattern
and, for now,Pattern
is nightly only, allowing a lot of flexibility for how we implementOsStr
support in the future (e.g. we could go as far as creating aOsPattern
trait and switching to it without breaking anyone)From an API design perspective, there is strong precedence for it
str
OsStr
as a pattern, we bypass the main dividing point between proposals (split APIs, panic on unpaired surrogates, switching away from WTF-8)Alternatives
#306 proposes a
OsStr::slice_encoded_bytes
unsafe
Links and related work
OsStr
#306(str, OsStr)
#114Pattern
private(str, OsStr)
#114What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: