Skip to content

from_utf8 should take BytesContainer #17375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mahkoh opened this issue Sep 18, 2014 · 7 comments
Closed

from_utf8 should take BytesContainer #17375

mahkoh opened this issue Sep 18, 2014 · 7 comments

Comments

@mahkoh
Copy link
Contributor

mahkoh commented Sep 18, 2014

std::str::from_utf8 takes &[u8] which means that you can't create a string from a &[c_char] in general.

Instead, from_utf8 should look like this:

pub fn from_utf8<T: BytesContainer>(v: &T) -> Option<&str>

Then implement BytesContainer for &[i8].

@huonw
Copy link
Member

huonw commented Sep 18, 2014

The lifetimes here need more finesse, taking a & means it will be hard to use the return value in more than a few statements.

@mahkoh
Copy link
Contributor Author

mahkoh commented Sep 18, 2014

Can you explain how this is different from the current from_str?

@huonw
Copy link
Member

huonw commented Sep 18, 2014

I assume you mean "the current from_utf8"?

The current function has signature:

fn from_utf8<'a>(x: &'a [u8]) -> Option<&'a str>

while this effectively has signature

fn from_utf8<'a, 'b: 'a>(x: &'a &'b [u8]) -> Option<&'a str>

(for T = &[u8].)

The extra reference and its shorter lifetime means the code would have to be written like the following, and it doesn't work:

struct Foo {
    name: Vec<u8>
}

impl Foo {
    fn name_as_str<'a>(&'a self) -> Option<&'a str> {
         str::from_utf8(&(self.as_slice()))
    }
}

since the & local is restricted to the name_as_str function, and thus doesn't satisfy 'a.

(Written without lifetime elision, for preciseness.)

DST may make this possible. (I.e. I'm just saying that a naive implementation wouldn't work, not that there isn't any implementation that works.)

@mahkoh
Copy link
Contributor Author

mahkoh commented Sep 18, 2014

I think it would be more useful to add methods that allow you to freely cast between &[uN] and &[iN].

Maybe like this:

#![feature(macro_rules)]

trait ImmutablePrimitiveSlice<'a, U, S> {
    fn as_unsigned(self) -> &'a [U];
    fn as_signed(self) -> &'a [S];
}

trait MutablePrimitiveSlice<'a, U, S>: ImmutablePrimitiveSlice<'a, U, S> {
    fn as_unsigned_mut(self) -> &'a mut [U];
    fn as_signed_mut(self) -> &'a mut [S];
}

macro_rules! prim_immut {
    ($u:ty, $s:ty, $t:ty) => {
        impl<'a> ImmutablePrimitiveSlice<'a, $u, $s> for $t {
            #[inline] fn as_unsigned(self) -> &'a [$u] { unsafe { std::mem::transmute(self) } }
            #[inline] fn as_signed(self) -> &'a [$s] { unsafe { std::mem::transmute(self) } }
        }
    }
}
macro_rules! prim_mut {
    ($u:ty, $s:ty, $t:ty) => {
        impl<'a> MutablePrimitiveSlice<'a, $u, $s> for $t {
            #[inline]
            fn as_unsigned_mut(self) -> &'a mut [$u] { unsafe { std::mem::transmute(self) } }
            #[inline]
            fn as_signed_mut(self) -> &'a mut [$s] { unsafe { std::mem::transmute(self) } }
        }
    }
}

macro_rules! prim {
    ($u:ty, $s:ty) => {
        prim_immut!($u, $s, &'a [$u])
        prim_immut!($u, $s, &'a [$s])
        prim_immut!($u, $s, &'a mut [$u])
        prim_immut!($u, $s, &'a mut [$s])
        prim_mut!($u, $s, &'a mut [$u])
        prim_mut!($u, $s, &'a mut [$s])
    }
}

prim!(u8,   i8)
prim!(u16,  i16)
prim!(u32,  i32)
prim!(u64,  i64)
prim!(uint, int)

fn main() {
    let x: &[i8] = [97, 98, 99];
    println!("{}", std::str::from_utf8(x.as_unsigned()));
}

@tbu-
Copy link
Contributor

tbu- commented Sep 18, 2014

This could benefit from the Coercible RFC.

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@huonw

DST may make this possible. (I.e. I'm just saying that a naive implementation wouldn't work, not that there isn't any implementation that works.)

I believe the proposed design will eventually work fine with DST, and in fact I plan to change current uses of BytesContainer and ToCStr on that basis. However, this work is currently blocked on some final DST work.

@steveklabnik
Copy link
Member

http://doc.rust-lang.org/nightly/std/str/fn.from_utf8.html still takes a &[u8], and is marked as stable, and so would need an RFC to change.

lnicola pushed a commit to lnicola/rust that referenced this issue Jun 23, 2024
Don't intern attribute inputs as their spans make them unique
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants