-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust generates unaligned loads when unpacking an aligned struct #54028
Comments
In the debug version of the LLVM IR, only the latter shows an explicit ; playground::load_unaligned
; Function Attrs: uwtable
define double @_ZN10playground14load_unaligned17hb6493993a7b635a6E(%Compound* noalias nocapture dereferenceable(16) %cmp) unnamed_addr #0 !dbg !4 {
start:
%imag = alloca double, align 8, !dbg !17
%real = alloca double, align 8, !dbg !17
call void @llvm.dbg.declare(metadata %Compound* %cmp, metadata !18, metadata !DIExpression()), !dbg !17
call void @llvm.dbg.declare(metadata double* %real, metadata !19, metadata !DIExpression()), !dbg !21
call void @llvm.dbg.declare(metadata double* %imag, metadata !22, metadata !DIExpression()), !dbg !21
%0 = bitcast %Compound* %cmp to %Complex*, !dbg !23
%1 = bitcast %Complex* %0 to double*, !dbg !23
%2 = load double, double* %1, align 8, !dbg !23
store double %2, double* %real, align 8, !dbg !23
%3 = bitcast %Compound* %cmp to %Complex*, !dbg !23
%4 = getelementptr inbounds %Complex, %Complex* %3, i32 0, i32 3, !dbg !23
%5 = load double, double* %4, align 8, !dbg !23
store double %5, double* %imag, align 8, !dbg !23
%6 = load double, double* %real, align 8, !dbg !24
%7 = load double, double* %real, align 8, !dbg !24
%8 = fmul double %6, %7, !dbg !24
%9 = load double, double* %imag, align 8, !dbg !24
%10 = load double, double* %imag, align 8, !dbg !24
%11 = fmul double %9, %10, !dbg !24
%12 = fadd double %8, %11, !dbg !24
ret double %12, !dbg !25
}
; playground::load_aligned
; Function Attrs: uwtable
define double @_ZN10playground12load_aligned17h2a5b875154d5e906E(%Compound* noalias nocapture dereferenceable(16) %cmp) unnamed_addr #0 !dbg !26 {
start:
%imag = alloca double, align 8, !dbg !27
%real = alloca double, align 8, !dbg !27
%_3 = alloca %Compound, align 16, !dbg !27
%cmp1 = alloca %Compound, align 16, !dbg !27
call void @llvm.dbg.declare(metadata %Compound* %cmp, metadata !28, metadata !DIExpression()), !dbg !27
call void @llvm.dbg.declare(metadata %Compound* %cmp1, metadata !29, metadata !DIExpression()), !dbg !31
call void @llvm.dbg.declare(metadata double* %real, metadata !32, metadata !DIExpression()), !dbg !34
call void @llvm.dbg.declare(metadata double* %imag, metadata !35, metadata !DIExpression()), !dbg !34
%0 = bitcast %Compound* %cmp to i8*, !dbg !36
%1 = bitcast %Compound* %_3 to i8*, !dbg !36
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 16, i32 16, i1 false), !dbg !36
%2 = bitcast %Compound* %_3 to i8*, !dbg !36
%3 = bitcast %Compound* %cmp1 to i8*, !dbg !36
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 16, i1 false), !dbg !36
%4 = bitcast %Compound* %cmp1 to %Complex*, !dbg !37
%5 = bitcast %Complex* %4 to double*, !dbg !37
%6 = load double, double* %5, align 8, !dbg !37
store double %6, double* %real, align 8, !dbg !37
%7 = bitcast %Compound* %cmp1 to %Complex*, !dbg !37
%8 = getelementptr inbounds %Complex, %Complex* %7, i32 0, i32 3, !dbg !37
%9 = load double, double* %8, align 8, !dbg !37
store double %9, double* %imag, align 8, !dbg !37
%10 = load double, double* %real, align 8, !dbg !38
%11 = load double, double* %real, align 8, !dbg !38
%12 = fmul double %10, %11, !dbg !38
%13 = load double, double* %imag, align 8, !dbg !38
%14 = load double, double* %imag, align 8, !dbg !38
%15 = fmul double %13, %14, !dbg !38
%16 = fadd double %12, %15, !dbg !38
ret double %16, !dbg !39
} |
I've been able to get an ever more minimal repro: #[repr(align(16))]
pub struct Complex {
real: f64,
imag: f64,
}
pub fn load_unaligned(z: Complex) -> f64 {
let Complex { real, imag } = z;
(real * real) + (imag * imag)
} |
Does it work as expected if instead of using #[cfg(target_arch = "x86")]
use std::arch::x86::__m128d as f64x2;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::__m128d as f64x2;
pub struct Complex {
_dummy: [f64x2; 0],
real: f64,
imag: f64,
} If so, then we could use vectors instead of integers for hinting higher alignments to LLVM, for both intra-struct/enum/union padding, and pointers without a LLVM pointee type. |
@eddyb No, unfortunately, still generates unaligned |
I left a comment on #53998, about a potential change that would help here, I think. That is, we can be less conservative, and preserve knowledge of a higher alignment for struct field accesses. cc @rust-lang/compiler Does anyone have opinions whether we should generate instructions with higher alignment, e.g. |
We should strive to provide as much information to LLVM as possible. Telling LLVM about greatest alignment alignment of fields gives it extra information it can work with. The interesting, and useful, scenario would be something like this:
which would allow LLVM to use aligned SSE instructions for loading |
While playing around with some complex numbers (structures with two
f64
s), I noticed Rust was generating unaligned loads, so I added the#[repr(align(16))]
parameter to them. But Rust was still generating unaligned operations.I redeclared the parameter as a local variable, and it got fixed. It only seems to happen when using a pattern to unpack the fields of an aligned structure.
Minimal repro (also on play.rust-lang.org):
While the functions look nearly the same, the disassembly differs:
This bug occurs on stable 1.28, beta and nightly.
The text was updated successfully, but these errors were encountered: