Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat args/env as lossy UTF-8 #12283

Closed
wants to merge 3 commits into from
Closed

Conversation

lilyball
Copy link
Contributor

Change os::args() and os::env() to use str::from_utf8_lossy().
Add new functions os::args_as_bytes() and os::env_as_bytes() to retrieve the args/env as byte vectors instead.

The existing methods were left returning strings because I expect that the common use-case is to want string handling.

Fixes #7188.

pub fn as_bytes_no_nul<'a>(&'a self) -> &'a [u8] {
if self.buf.is_null() { fail!("CString is null!"); }
unsafe {
cast::transmute((self.buf, self.len()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transmute a raw::Slice to avoid accidentally getting the order mixed up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I was copying as_bytes() but this is a good excuse to fix that.

os::args() was using str::raw::from_c_str(), which would assert if the
C-string wasn't valid UTF-8. Switch to using from_utf8_lossy() instead,
and add a separate function os::args_as_bytes() that returns the ~[u8]
byte-vectors instead.
Parse the environment by default with from_utf8_lossy. Also provide
byte-vector equivalents (e.g. os::env_as_bytes()).

Unfortunately, setenv() can't have a byte-vector equivalent because of
Windows support, unless we want to define a setenv_bytes() that fails
under Windows for non-UTF8 (or non-UTF16).
@lilyball
Copy link
Contributor Author

New version pushed

@erickt
Copy link
Contributor

erickt commented Feb 15, 2014

Generally I like this approach. Why .as_bytes_no_nul instead of .as_bytes_no_null or .as_bytes_without_null? We use null far more often than nul. The only other nul I could find was some internal functions in path, which should probably also be renamed too.

@lilyball
Copy link
Contributor Author

The character '\0' is correctly termed NUL. "null" is a generic word that I've seen some people use in Rust to mean '\0' but more commonly is used to me the zero pointer. And .as_bytes_wihout_nul just seems unnecessarily long.

bors added a commit that referenced this pull request Feb 15, 2014
Change `os::args()` and `os::env()` to use `str::from_utf8_lossy()`.
Add new functions `os::args_as_bytes()` and `os::env_as_bytes()` to retrieve the args/env as byte vectors instead.

The existing methods were left returning strings because I expect that the common use-case is to want string handling.

Fixes #7188.
@bors bors closed this Feb 15, 2014
@lilyball lilyball deleted the env-args-bytes branch February 15, 2014 20:24
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 26, 2024
Minor refactor format-args

* Move all linting logic into a single format implementations struct

This should help with the future format-args improvements.

**NOTE TO REVIEWERS**:  use "hide whitespace" in the github diff -- most of the code has shifted, but relatively low number of lines actually modified.

Followig up from rust-lang#12274

r? `@xFrednet`

---

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Current ~str UTF-8 behavior allows for denial-of-service attack with args, environ
4 participants