-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove str trailing nulls #8296
Conversation
When strings lose their trailing null, this pattern will become dangerous: let foo = "bar"; let foo_ptr: *u8 = &foo[0]; Instead we should use c_strs to handle this correctly.
I agree 100% with removing the legacy trailing It doesn't have to happen as part of this pull request, but we should really switch I think we'll find most places using I'll hold off on an r+ because this deserves more attention than most pull requests. |
/// # Failure | ||
/// | ||
/// Fails if the CString is null. | ||
pub fn as_bytes(&self) -> &'self [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be like this?
fn as_bytes<'r>(&'r self) -> &'r [u8]
Really awesome effort. With the CString type, when it doesn't own it's buffer is it true that any operation on it is effectively unsafe? So CString has a sort of safe/unsafe modality? If so that's kind of frightening. |
@brson: Maybe I didn't document my intentions of CString owning or not owning the returned pointer well enough. One of the more obnoxious things to deal with in C is who owns the pointer. That said, this api may be a little half baked at the moment and I'd be okay with removing this feature for now. Perhaps instead I should follow |
@erickt this is great progress, so even if the design needs to continue to evolve I'm pretty happy merging it. I think we probably need to bring it up in the meeting tomorrow though since this is a major language change. |
This will probably need to be rebased on top of #8289 and |
@erickt can you merge again? |
Merging because this is a big annoying bitrotty change and feels like the main API surface of it won't change much in the future. I too am a little concerned about ownership being controlled by a flag, and by the dtor calling free. I think this difference may be significant enough to represent at the type level? I'm unsure. I can think of a number of other ways of representing it, few of which seem like they are particularly safe / footgun-proof. I'd love to be able to leverage lifetimes for this, such that (say) a Also somewhat concerned about Also ergonomically a bit concerned with |
mini-regression, the recent function signature change of |
@blake2-ppc: thanks for catching that. I folded a fix for it into #8430 to cut down on the bors queue. |
fn drop(&self) { | ||
if self.owns_buffer_ { | ||
unsafe { | ||
libc::free(self.buf as *libc::c_void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows there is no guarantee that "free" is the correct function to use to deallocate an OS-supplied buffer. Typically each API or set of APIs will have its own free-equivalent (but you have to call the correct one, or everything may break).
Raymond Chen (at the Old New Thing blog) has lots more details: http://blogs.msdn.com/b/oldnewthing/archive/2006/09/15/755966.aspx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Erik-S I'm afraid that nobody will see a comment on a closed pull request, can you file an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I opened issue #8465
One of the downsides with `c_str` is that it always forces an allocation, and so this could add unnecessary overhead to various calls. This PR implements one of the suggestions @graydon made in #8296 for `vec.with_c_str` in that for a short string can use a small stack array instead of a malloced array for our temporary c string. This ends up being twice as fast for small strings. There are two things to consider before landing this commit though. First off, I arbitrarily picked the stack array as 32 bytes, and I'm not sure if this a reasonable amount or not. Second, there is a risk that someone can keep a reference to the interior stack pointer, which could cause mayhem if someone were to dereference the pointer. Since we also can easily grab and return interior pointers to vecs though, I don't think this is that much of an issue.
"Oh, you can just malloc a new string with the null termination", said the developers who have never seen a microcontroller. "Screw anybody using Rust in an embedded environment and needing to interoperate with C." Seriously, sometimes RAM is literally that tight. This does nothing but go out of your way to hassle people using Rust in low memory environments. Thanks, now I have to go write a special allocator to copy small strings and terminate them without fragmenting my handful of kB. |
new lint: `missing_asserts_for_indexing` Fixes rust-lang#8296 This lint looks for repeated slice indexing and suggests adding an `assert!` beforehand that helps LLVM elide bounds checks. The lint documentation has an example. I'm not really sure what category this should be in. It seems like a nice lint for the `perf` category but I suspect this has a pretty high FP rate, so it might have to be a pedantic lint or something. I'm also not sure about the name. If someone knows a better name for this lint, I'd be fine with changing it. changelog: new lint [`missing_asserts_for_indexing`]
This PR fixes #7235 and #3371, which removes trailing nulls from
str
types. Instead, it replaces the creation of c strings with a new type,std::c_str::CString
, which wraps a malloced byte array, and respects: