Skip to content

(0-1) as *const T triggers horribly stupid behavior #43291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
retep998 opened this issue Jul 17, 2017 · 22 comments
Closed

(0-1) as *const T triggers horribly stupid behavior #43291

retep998 opened this issue Jul 17, 2017 · 22 comments
Labels
C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@retep998
Copy link
Member

retep998 commented Jul 17, 2017

During this issue all examples will be done using x86_64.

When you do 0 as *const T the type of the integer literal is inferred to be usize. This is fine.
When you do !0 as *const T you can observe that the resulting address is 0xffffffffffffffff which is fine.

When doing FFI sometimes a handle type will be represented as a raw pointer type and sometimes there will be constants of that handle type initialized to negative values, which is entirely normal.
The user's first choice is to try -1 as *const T which will of course fail because the type of the integer literal is inferred to be usize which cannot be negated because Rust hates me.
Ideally Rust would be able to infer the type as isize instead of usize in this case, but I know better than to expect Rust to be actually helpful.

The issue is if the user tries to be clever and do (0-1) as *const T which somehow seems to work but in reality they are triggering a combination of two horrible design decisions (or maybe bugs, it can be hard to tell sometimes) resulting in a giant footgun.

  1. The type of those integer literals is being inferred as i32 instead of usize. Rust earlier refused to infer the type as isize instead of usize, so why is it now inferring the type as i32?
  2. The conversion from i32 to *const T does zero extension instead of sign extension which is different than what C++ does. Add lint for u8 as *mut cast #42915

As a result, if you check the resulting address of (0-1) as *const T you'll see that it gives 0xffffffff which is wrong.
The correct answer is achieved by doing -1isize as *const T which results in the correct address of 0xffffffffffffffff.

Ideally both of those two points above would be fixed, but at the very least there should be a very loud warning that you are doing something very horribly wrong.

And yes, I found examples of this in winapi so some care will be needed when fixing this in order to not break the ecosystem.

Possible solutions

  1. Add a lint to forbid casts between pointer types and integer types other than usize/isize.
  2. Change casts from signed integer types to pointer types to perform signed conversion.
  3. Fix the inference issue causing the literals to be inferred as i32 instead of usize.
@retep998 retep998 added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jul 17, 2017
@oli-obk
Copy link
Contributor

oli-obk commented Jul 18, 2017

I have a suggestion: an allow by default lint that triggers whenever inference defaults to i32. It seems often surprising, especially with weird down the road errors like "expected x, got i32".

@hanna-kruppe
Copy link
Contributor

I have no data but I get the impression the i32 fallback fires (legitimately) a bit too frequently for such a lint to be really useful. Besides, if it's allow by default, it doesn't really help migitate this issue, unless we can get unsafe code authors to generally set the lint to warn by default (seems unlikely). A warn-by-default pointing out integer-pointer cases that widen/truncate seems more useful.

@ranma42
Copy link
Contributor

ranma42 commented Jul 18, 2017

I agree that the lint suggested by @rkruppe would be very useful, but I am afraid it could also "mistakenly" warn on 0 as *const T and other (non-negative) guard values. Maybe the lint could avoid warning on non-negative constant values?

Also, would it be possible to make the integer to pointer conversion perform sign extension?

@retep998
Copy link
Member Author

@ranma42 In 0 as *const T the integer literal has an inferred type of usize so a lint that only fires on casts from integer types other than usize/isize to pointer types wouldn't complain about that example.

@arielb1
Copy link
Contributor

arielb1 commented Jul 18, 2017

At the time when me and @nrc discussed casts, it turned out that libc does a lot of casting of pointer-sized i32/i64 (not isize) to pointers, and we decided to not have an architecture-specific rule in the language. I think a target-specific lint will be less bad.

@arielb1
Copy link
Contributor

arielb1 commented Jul 18, 2017

The behavior of casts to pointers looks like a bug - I certainly intended casts to have the signedness of the source type (like how it works for ints), but I missed it while implementing. Not sure whether we want to fix it. I doubt it will break anyone's code, but it is somewhat scary.

Maybe have a lint against it for enough releases and then change the behavior?

@scottmcm
Copy link
Member

Would a lint for "cast to pointer from type of different size" be something the portability lint RFC can do?

@eddyb
Copy link
Member

eddyb commented Jul 20, 2017

This is really strange: it appears casting from a signed integer to a pointer doesn't do the same as going through usize in between, which is what it should be doing.
In other words, (-1i32) as *const u8 should be equal to (-1i32) as usize as *const u8.
What I'm seeing is 0xffffffff for the former and 0xffffffffffffffff for the latter, in the LLVM backend, both the runtime codegen, and the late-compile-time evaluator.
I'm not sure I could possibly check the early evaluator because casting a raw pointer to an integer is disallowed, e.g. I can't use one of those expressions in [(); expr as usize].len() to get the value.

That leaves the fourth Rust evaluator, miri, which @RalfJung says it produces 0xff for both.
So we have between 2 and 4 distinct codegen/const-eval bugs waiting to be fixed here 😆.

EDIT: ah I see @arielb1 has already mentioned something along these lines in passing, but I can't find the i32 to pointer cast being addressed explicitly in the thread. IMO it's clearly wrong.

@aturon aturon added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 20, 2017
@RalfJung
Copy link
Member

With rust-lang/miri#267, miri's behavior is now consistent with rustc for casts that go through usize (well, for the ones I tested^^) and yields 0xffffffffffffffff for (-1i32) as *const u8 as usize. This disagrees with rustc, but I think it is the one we want.

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 28, 2017
@aturon
Copy link
Member

aturon commented Aug 3, 2017

The lang team discussed this issue today, and believes it's possible to write a lint to detect this issue. We'd like to see that implemented, and do a crater run, then consider fixing the bug.

Thus: punting nomination to the compiler team.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 4, 2017

I wrote a lint, but it triggers on if cond { 1 } else { 0 } as *mut T, because 1 and 0 are inferred to be i32.

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

@oli-obk That doesn't strike me as correct.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 4, 2017

It is correct insofar as the lint should trigger on let x = 1i32; let y = x as *const T; as I understand it.

The question is why doesn't inference pass the suggestion into the if branches?

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

I mean casting 1 to a pointer seems incorrect nowadays.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 4, 2017

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

Ah, custom pointer values. Just write 1usize explicitly or something and let's do a cargobomb run.

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2017

I mean casting 1 to a pointer seems incorrect nowadays.

Really? The bad sign extension only makes a difference for negative values, doesn't it?

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

I meant in terms of alignment but it seems the 0x1 pointer is used as a guard value.

bors added a commit that referenced this issue Aug 10, 2017
…<try>

Lint casting signed integers smaller than `isize` to raw pointers

cc #43291 (keeping open as tracking issue)
@arielb1 arielb1 removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 10, 2017
@oli-obk
Copy link
Contributor

oli-obk commented Aug 21, 2017

We'd like to see that implemented, and do a crater run, then consider fixing the bug.

Crater came through fine in #43641, so we can fix the bug!

To be exact, crater showed some cases of casting from a signed integer to a raw pointer, but they all were positive values or zero.

@retep998
Copy link
Member Author

@oli-obk Crater doesn't test Windows though, so it didn't catch all the cases in winapi where that bug is actually hit. Fortunately in this case the fix would actually fix those old winapi versions.

@retep998
Copy link
Member Author

retep998 commented Oct 9, 2017

Is there any progress towards fixing this? Both PRs related to lints for this were closed.

@aturon
Copy link
Member

aturon commented Oct 19, 2017

@eddyb and @oli-obk to follow up. Leaving nominated until closed.

bors added a commit that referenced this issue Jan 4, 2018
Force appropriate extension when converting from int to ptr #43291

Fixes #43291.

Looking for feedback if I've missed something and/or need to add more tests.

@eddyb @retep998 @nagisa @oli-obk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants