-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Option<TinyAsciiStr> be the same size as TinyAsciiStr #2083
Comments
Did I? I don't recall having such suggestions, I don't think it's currently possible for the generic type. |
I think the naive solution is to just make the first byte be NonZeroU8, such that we can't represent empty strings, but it creates the niche for Option to use. A fancier solution is to recognize that 0xFF is never a valid ASCII byte (or UTF-8 byte) and use that as the niche somehow. |
Here's how it works with the NonZeroU8 lead byte (it might be necessary to reduce the |
For the fancier solution, I think the safe way is to make an enum with a value for each valid ASCII char, and the internal representation is an array of those enums instead of an array of raw |
Oh, huh. The N-1 thing was something I was trying earlier but there were const problems, I guess that works now Honestly I'd prefer the enum solution since it keeps it clean. Might need to be careful about the repr (C can't use niches, unsure if u8 can, Rust is UB to use this way) |
The const problem is still there, you cannot evaluate const generics, so you cannot write the type |
I tried to find some documentation on how to define a niche, but it seems like the niche optimizations are entirely compiler internals are thus not reference-guaranteed behaviour. For the enum option, what's the plan wrt to getting a |
You cannot define niches beyond making your own enums. Yes, transmute. If we can guarantee the repr we can transmute. |
Right, the NonZeroU8 option involves a new type with the size offset by one. |
But even that doesn't seem to be guaranteed iiuc |
I think it will be very confusing to use |
well, nothing in repr(Rust) is Also yes, I agree about N-1 |
Also, uh, the other problem with N-1 is that |
The playground link I posted earlier demonstrates that it works (with packed) |
If it works with packed that should be okay, packed is defined. |
OK, should we vote?
Note: This is intended as a future feature; I just want to record the conclusion so that it's ready for someone to get started. |
Question: is there a code size risk with either of these choices? Will we not know until it is implemented? Would the enum compile down to a zero-cost abstraction? |
Well, with the enum i suspect we won't ever access it as the enum, we'll always be transmuting I hae a preference for Option 2 |
2 >> 1 because the |
Also uh sorry my preference is for Option 2 I mistyped 😄 Also am 2>>1. |
@zbraniecki found https://github.com/ParkMyCar/compact_str, which uses |
Looking at the implementation, it's storing values as |
Oh I was hoping it's doing something cleverer |
@Manishearth Do we need to define every value in the enum in order to prevent it from being used as the niche, or is setting a start and end range sufficient? |
👋 I pushed the niche optimization for I don't see a The important part of #[cfg(target_pointer_width = "64")]
const MAX_SIZE: usize = 24;
#[cfg(target_pointer_width = "32")]
const MAX_SIZE: usize = 12;
const LENGTH_MASK: u8 = 0b11000000;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct InlineString {
buffer: [u8; MAX_SIZE],
}
impl InlineString {
#[inline]
pub fn new(text: &str) -> Self {
debug_assert!(text.len() <= MAX_SIZE);
let len = text.len();
let mut buffer = [0u8; MAX_SIZE];
// set the length
buffer[MAX_SIZE - 1] = len as u8 | LENGTH_MASK;
// copy the string
//
// note: in the case where len == MAX_SIZE, we'll overwrite the len, but that's okay because
// when reading the length we can detect that the last byte is part of UTF-8 and return a
// length of MAX_SIZE
unsafe { std::ptr::copy_nonoverlapping(text.as_ptr(), buffer.as_mut_ptr(), len) };
InlineString { buffer }
}
This representation should be 100% compatible with true-zero-copy usage. Oh, our |
@sffc every undefined value is a niche, regardless of whether rustc currently uses it |
Previously, the
tinystr
crate usedNonZeroU32
and friends to optimizeOption<TinyStr4>
. We have not yet replicated this optimization after the rewrite toTinyAsciiStr
.@Manishearth had some suggestions on how to go about this.
The text was updated successfully, but these errors were encountered: