Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current ~str UTF-8 behavior allows for denial-of-service attack with args, environ #7188

Closed
lilyball opened this issue Jun 16, 2013 · 12 comments
Labels
A-Unicode Area: Unicode

Comments

@lilyball
Copy link
Contributor

The current behavior of ~str is that it unilaterally rejects any invalid UTF-8 sequence (modulo #3787). Unfortunately, this opens up rust programs to denial-of-service attacks where maliciously crafted user input can cause unexpected task failure. Two cases that exist right now are invalid UTF-8 in the args list and in the environment. The mere presence of the invalid UTF-8 will cause os::args() and os::env() to immediately raise the str::not_utf8 condition, which is unlikely to be handled by callers of these functions.

I've suggested this before on the IRC channel, but I think it's worth suggesting again, that when parsing UTF-8 we should consider simply translating the first byte of any invalid sequence into the Replacement Character (U+FFFD) instead of failing outright.

@graydon
Copy link
Contributor

graydon commented Jun 16, 2013

This is a plausible default handler. We don't presently support default handlers, but I might be ok with it if/when we do.

@lilyball
Copy link
Contributor Author

AFAICT, using a condition handler lets you replace the string wholesale, not replace the offending character. What sort of default handler would we use to let clients figure out the difference between a replaced string, and one that was, say, empty to begin with?

@graydon
Copy link
Contributor

graydon commented Jun 16, 2013

You can easily change the signature of the condition to use an enum that expresses these differences, if they matter.

@lilyball
Copy link
Contributor Author

That doesn't help if you rely on the default condition handler though, because in the end it just replaces the bad string with another one.

@huonw
Copy link
Member

huonw commented Jun 17, 2013

Isn't it entirely incorrect to assume that the arguments and the environment are UTF-8 encoded? I.e. these two functions should return ~[u8] (possibly with helpers that return ~str).

@thestinger
Copy link
Contributor

I don't think this is really an issue in the str type since it's intended to represent valid UTF-8 strings. Clobbering non-utf8 arguments or environment variables would also be a bug since that's totally valid.

@graydon
Copy link
Contributor

graydon commented Jun 17, 2013

In general we have no or only poorly sketched interfaces for accepting other encodings presently. it would be good to grow some.

@bblum
Copy link
Contributor

bblum commented Jul 3, 2013

nominating well-covered milestone

@catamorphism
Copy link
Contributor

Just a bug, declining

@lilyball
Copy link
Contributor Author

lilyball commented Sep 5, 2013

Fixing this is presumably going to require changing API (to provide byte vectors as the primary value type and perhaps a secondary method that gives Option<~str>).

@ben0x539
Copy link
Contributor

ben0x539 commented Nov 3, 2013

Apparently even an empty program (eg., fn main() {}) fails with an assert here if there's an argument with non-utf-8 byte sequences. You don't need to call os::args() in user code to trigger this and there's obviously no place to catch it with a condition. :(

At least os::getenv only fails when the requested environment variable is invalid utf-8, and not any unrelated one. :)

@SimonSapin
Copy link
Contributor

IMO args() and getenv should return bytes instead of strings.

@bors bors closed this as completed in fba32ea Feb 15, 2014
flip1995 pushed a commit to flip1995/rust that referenced this issue May 20, 2021
… r=xFrednet,flip1995

`needless_collect` enhancements

fixes rust-lang#7164
changelog: `needless_collect`: For `BTreeMap` and `HashMap` lint only `is_empty`, as `len` might produce different results than iter's `count`
changelog: `needless_collect`: Lint `LinkedList` and `BinaryHeap` in direct usage case as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Unicode Area: Unicode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants