Skip to content

Bug on non-power-of-2 SIMD vectors #20460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simnalamburt opened this issue Jan 3, 2015 · 19 comments · Fixed by #20611
Closed

Bug on non-power-of-2 SIMD vectors #20460

simnalamburt opened this issue Jan 3, 2015 · 19 comments · Fixed by #20611

Comments

@simnalamburt
Copy link
Contributor

simnalamburt commented Jan 3, 2015

I think I found a bug with non-power-of-2 SIMD vectors. They're randomly turned into zero values when I store them on Vec

http://is.gd/LfOHbj

#[simd]
#[deriving(Show)]
struct f32x3(f32, f32, f32);

fn main() {
    let mut points = Vec::new();

    for _ in 0i..10 {
        points.push(f32x3(1.0, 1.0, 1.0));
    }

    for point in points.iter() {
        println!("{}", point);
    }
}

Expected result:

f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)

What we get:

f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(0, 0, 0)
f32x3(1, 1, 1)
f32x3(1, 1, 1)
f32x3(0, 0, 0)
f32x3(0, 0, 0)
f32x3(1, 1, 1)
f32x3(1, 1, 1)

Tested with:

rustc 0.13.0-nightly (39d740266 2015-01-01 15:51:08 +0000)
host: x86_64-apple-darwin
rustc 0.13.0-nightly (39d740266 2015-01-01 15:51:08 +0000)
host: x86_64-unknown-linux-gnu (Ubuntu 14.04.1 LTS)
@lifthrasiir
Copy link
Contributor

http://is.gd/wRJF3h

As you can see, this is a fault of Vec assuming size_of::<T>() >= min_align_of::<T>() (while for T = f32x3 this is 12 and 16 respectively). More accurately, for T = f32x3, every growth allocates cap * 12 bytes while it actually needs cap * 16 bytes; any element after len >= cap * 3 / 4 will overflow the memory, and they will be omitted at the next reallocation.

@Gankra
Copy link
Contributor

Gankra commented Jan 4, 2015

Ack!

@Gankra
Copy link
Contributor

Gankra commented Jan 4, 2015

CC @thestinger

@simnalamburt
Copy link
Contributor Author

simnalamburt commented Jan 4, 2015

I've got through some little experiment and I guess there's something wrong with size_of::<T>() for f32x3 and other non-power-of-2 SIMD types.

                     | size_of | min_align_of | align_of | size_of >= min_align_of
---------------------+---------+--------------+----------+-------------------------
 f32x3               |      12 |           16 |       16 | false
 f32x3_no_c          |      12 |           16 |       16 | false
 f32x3_no_simd       |      12 |            4 |        8 | true

                     | size_of | min_align_of | align_of | size_of >= min_align_of
---------------------+---------+--------------+----------+-------------------------
 u8                  |       1 |            1 |        1 | true
 OneByte             |       1 |            1 |        8 | true
 u16                 |       2 |            2 |        2 | true
 TwoByte             |       2 |            2 |        8 | true
 (u8, u16)           |       4 |            2 |        8 | true
 (u16, u8)           |       4 |            2 |        8 | true
 ThreeByte           |       4 |            2 |        8 | true
 (u32, u8)           |       8 |            4 |        8 | true
 FiveByte            |       8 |            4 |        8 | true
 (f32x3, u8)         |      32 |           16 |       16 | true
 (f32x3_no_simd, u8) |      16 |            4 |        8 | true

I think now we've got 2 available options.

  1. Make size_of T >= min_align_of T always true even for non-power-of-2 SIMD types like f32x3.
  2. Make Vec<T> function properly even for size_of T < min_align_of T situation.

I'm pretty new with Rust. Someone pleaes make me an advise!

@Gankra Gankra added the I-wrong label Jan 4, 2015
@Aatch
Copy link
Contributor

Aatch commented Jan 4, 2015

I'm not sure what the exact behaviour should be. Part of the problem is that non-power-of-two SIMD types don't exist on most platforms, so it gets widened to whatever it needs to be. However, an f32x3 is still only 12 bytes of actual data.

I think that Vec should handle the case anyway. Whether the case should come up... I don't really know.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 4, 2015

Given that sizeof (f32x3, u32) == 32, it seems like rust already adds padding for SIMD types, so it's simply size_of that's broken, and should include the padding.

At the same time, Vec should probably assert that size_of T >= min_align_of T to catch any errors if in the future there are any types where their size is less than their alignment.

@Gankra
Copy link
Contributor

Gankra commented Jan 4, 2015

Nominating because the required change may be a back-compat change to some primitive semantics.

@Aatch
Copy link
Contributor

Aatch commented Jan 4, 2015

Actually, thinking about it some more. I believe that the current behaviour is correct. size_of doesn't account for padding, because padding is situation-dependent: (i8, u64) has 7 bytes of padding after the i8, but that doesn't stop i8 from being 1 byte in size. Just because the type gets padding, doesn't mean it needs to change size.

The whole point of having a different alignment to size is to cover these kinds of cases, where the number of usable bytes is not a nice integer multiple of the alignment.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 4, 2015

@Aatch In that situation, the extra padding is added to meet the alignment requirements of "u64", not the "i8". In the SIMD example, the extra padding is part of the SIMD type.
ie. for (f32x3, u32) there are 4 bytes of padding between the fields, even though u32 only requires alignment to 4 bytes.

Either the compiler should not add any padding and sizeof (f32x3, u32) == 16, and any accesses take care to preserve the u32 or as is more usual, the SIMD type should include it's padding.

edit:
To clarify, types do include their padding in size_of, which is why size_of (u64, i8) == 16 as opposed to 9

@Diggsey
Copy link
Contributor

Diggsey commented Jan 5, 2015

Amusingly, struct Test(f32x3); has a size of 16, whereas f32x3 on it's own has a size of 12, which really shouldn't happen.

The issue is caused by size_of being implemented as a call to this function:

pub fn llsize_of_real(cx: &CrateContext, ty: Type) -> llsize {

The comment above the function is wrong: LLVM does include padding when calculating the size of types, but only for structs (not for vectorised types), which makes that function fairly useless and definitely not the preferred option as the comment suggests. llsize_of should be preferred instead which always includes padding, so is at least consistent.

@huonw
Copy link
Member

huonw commented Jan 5, 2015

Nominating because the required change may be a back-compat change to some primitive semantics.

NB. #[simd] and std::simd are experimental and are highly unlikely to be stable at 1.0.

@simnalamburt
Copy link
Contributor Author

Current librustc works like below:

size_of::<T>() = ceil(LLVMSizeOfTypeInBits() / 8)
min_align_of::<T>() = LLVMABIAlignmentOfType()

and I found that current LLVM doesn't have any guarantee between LLVMSizeOfTypeInBits and LLVMABIAlignmentOfType. Instead, for SIMD vector types, LLVMABIAlignmentOfType is guaranteed to be greater or equal than LLVMSizeOfTypeInBits / 8. Look at this.

For example, LLVMSizeOfTypeInBits and LLVMABIAlignmentOfType of 80-bit floating point format is 80 and 16. I know that Rust doesn't have f80 but if it does, size_of::<f80>() will be 10 and min_align_of::<f80>() will be 16. Which means Vec<f80> would have same problem.

Now I see 2 options here

  1. Make size_of::<T>() call llsize_of, not llsize_of_real (as @Diggsey said)
  2. Make Vec<T> function properly even for size_of T < min_align_of T situation.
References

@Gankra
Copy link
Contributor

Gankra commented Jan 5, 2015

@huonw I meant more-so the size_of semantics, but simd semantics might also be affected.

@thestinger
Copy link
Contributor

I think this is a compiler bug. There are many low-level APIs assuming that size is a multiple of the alignment...

@thestinger
Copy link
Contributor

For example, C11's aligned_alloc function requires that size is a multiple of the alignment parameter. It's expected that the compiler will always output sane values for both (unlike rustc :P) so it is a sensible limitation.

@gereeter
Copy link
Contributor

gereeter commented Jan 6, 2015

Unfortunately, forcing size_of to return a multiple of min_align_of would make #17027 impossible, which seems like a big missed opportunity. Perhaps there should be a padded_size_of function that just rounds size_of to the next multiple of min_align_of?

@Diggsey
Copy link
Contributor

Diggsey commented Jan 6, 2015

@greeter
You could have both size_of and padded_size_of, but for the moment, they'd both have to include padding because for struct types LLVM provides no way to get the size without padding, and for vectorised types, even size_of should include padding as #17027 can't be applied to vectorised types without masking every load/store to avoid overwriting the u32 in (f32x3, u32).

@simnalamburt
Copy link
Contributor Author

Currently I'm testing e4c20db, and it seems to cause no trouble. Please make me an advise if I'm doing something wrong.

@simnalamburt
Copy link
Contributor Author

OK Let's do this → #20611

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jan 8, 2015
This PR fixes the issue rust-lang#20460, and it doesn't touch any existing behavior except the bug of the SIMD types.

Closes rust-lang#20460.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants