Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit Array Pointer Misalignment on riscv32i #86693

Closed
bunnie opened this issue Jun 28, 2021 · 8 comments
Closed

64-bit Array Pointer Misalignment on riscv32i #86693

bunnie opened this issue Jun 28, 2021 · 8 comments
Labels
C-bug Category: This is a bug. O-riscv Target: RISC-V architecture

Comments

@bunnie
Copy link
Contributor

bunnie commented Jun 28, 2021

I think this is a bug, which is exhibited in a riscv32imac, but not an x86 target.

I tried this code (see it on goldbolt / playground ); the version below has the log statements in there so you can line up the code against log data I provide later in this bug:

#[derive(Copy, Clone, Hash)]
pub struct Scalar {
    pub bytes: [u8; 32],
}
impl Scalar {
    pub fn non_adjacent_form(&self, w: usize) -> [i8; 256] {

        let mut naf = [0i8; 256];

        let mut x_u64 = [0u64; 5];
        LittleEndian::read_u64_into(&self.bytes, &mut x_u64[0..4]);

        log::info!("x_u64: {:x?}", x_u64);

        let width = 1 << w;
        let window_mask = width - 1;

        let mut pos = 0;
        let mut carry = 0;
        while pos < 256 {
            // Construct a buffer of bits of the scalar, starting at bit `pos`
            let u64_idx = pos / 64;
            let bit_idx = pos % 64;
            let bit_buf: u64;
            if bit_idx < 64 - w {
                // This window's bits are contained in a single u64
                bit_buf = x_u64[u64_idx] >> bit_idx;   ///////////////////////////////////// <------- this does not work right
                log::info!("bit_buf: {:x}", bit_buf);
            } else {
                // Combine the current u64's bits with the bits from the next u64
                bit_buf = (x_u64[u64_idx] >> bit_idx) | (x_u64[1+u64_idx] << (64 - bit_idx));
            }

            let window = carry + (bit_buf & window_mask);

            if window & 1 == 0 {
                pos += 1;
                continue;
            }

            if window < width/2 {
                log::info!("carry 0 width {} naf[{}] = {}; c.{} bb.{:x} wm.{} idx64.{} idxbit.{} xu64[0].{:x}", width, pos, window,
                    carry, bit_buf, window_mask, u64_idx, bit_idx, x_u64[0],
                );
                carry = 0;
                naf[pos] = window as i8;
            } else {
                log::info!("carry 1 width {} naf[{}] = {}/{}; c.{} bb.{:x} wm.{} idx64.{} idxbit.{} xu64[0].{:x}", width, pos, window, (window as i8).wrapping_sub(width as i8),
                    carry, bit_buf, window_mask, u64_idx, bit_idx, x_u64[0]
                );
                carry = 1;
                naf[pos] = (window as i8).wrapping_sub(width as i8);
            }

            pos += w;
        }

        naf
    }
}

I expected to see this happen:

This is a snippet from the curve25519-dalek crate's scalar.rs module, which is used in ed25519 signatures. Fortunately, there are well-known test vectors for curve25519, so I'm able to give you expected inputs and outputs.

    let scalar = Scalar {
        bytes: [189, 59, 214, 8, 77, 86, 240, 50, 111, 170, 86, 37, 124, 154, 209, 79, 102, 72, 93, 53, 130, 157, 102, 200, 60, 240, 215, 104, 246, 58, 214, 13],
    };
    let a_naf = scalar.non_adjacent_form(5);

    let expected_result = [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0];

    assert!(expected_result ==  a_naf, "something went wrong!");

Instead, this happened:

  • On x86, this code produces the expected result.
  • On a RISCV32-IMAC target (seen both in renode emulation and on live hardware), we get this (with emphasis on the first salient evidence of an error):
                          vvvvvvvvvvvvvv
INFO:test_stub: x_u64: [32f0564d08d63bbd, 4fd19a7c2556aa6f, c8669d82355d4866, dd63af668d7f03c, 0] (services\test-stub\src\main.rs:55)
INFO:ticktimer_server: Server started with SID SID([1801677172, 1701669236, 1702047090, 1919252082]) (services\ticktimer-server\src\main.rs:503)
INFO:test_stub: bit_buf: 8d63bbd08d63bbd (services\test-stub\src\main.rs:78)
                         ^^^^^^^^^^^^^^^
INFO:test_stub: carry 1 width 32 naf[0] = 29/-3; c.0 bb.8d63bbd08d63bbd wm.31 idx64.0 idxbit.0 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:103)
INFO:test_stub: bit_buf: 46b1dde846b1dd (services\test-stub\src\main.rs:78)
INFO:test_stub: bit_buf: 2358eef42358ee (services\test-stub\src\main.rs:78)
INFO:test_stub: carry 0 width 32 naf[6] = 15; c.1 bb.2358eef42358ee wm.31 idx64.0 idxbit.6 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:97)
INFO:test_stub: bit_buf: 11ac777a11ac7 (services\test-stub\src\main.rs:78)
INFO:test_stub: carry 0 width 32 naf[11] = 7; c.0 bb.11ac777a11ac7 wm.31 idx64.0 idxbit.11 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:97)

.... much spew ....

INFO:test_stub: a_naf: [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, ***-3***, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 15, 0, 0, 0, 0, -13, 0, 0, 0, 0, 11, 0, 0, 0, 0, 13, 0, 0, 0, 0, -11, 0, 0, 0, 0, -13, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, -15, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, -3, 0, 0, 0, 0, 11, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, 3, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, 3, 0, 0] (services\test-stub\src\main.rs:131)
INFO:test_stub: expected_result: [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, ***13***, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0] (services\test-stub\src\main.rs:132)

(emphasis mine on the first mismatching index from the result)

Basically, on the first iteration of the loop, bit_buf should be loaded with the value of x_u64[0] >> bit_idx, and because bit_idx is 0, bit_buf should be exactly x_u64[0].

Instead, I find that the value of x_u64[0] is 32f0564d_08d63bbd, and the value of bit_buf is 8d63bbd_08d63bbd. The key symptom is that instead of representing a whole 64-bit value, bit_buf seems to be just the lower 32 bits of the 64-bit value, replicated up to the upper 32 bits.

This problem continues in later iterations of the loop, with the upper 32 bits taking a copy of the lower 32 bits.

Meta

rustc --version --verbose:

rustc --version --verbose
rustc 1.52.1 (9bc8c42bb 2021-05-09)
binary: rustc
commit-hash: 9bc8c42bb2f19e745a63f3445f1ac248fb015e53
commit-date: 2021-05-09
host: x86_64-pc-windows-msvc
release: 1.52.1
LLVM version: 12.0.0

I feel like there should be some mention of the RISCV target toolchain version as well, but the bug template does not include an instruction on how to extract that. To wit, I was able to reproduce the bug using a Linux-based toolchain running Renode, as well as using a Windows-based toolchain running on live hardware (using the VexRISCV implementation on an FPGA).

Unfortunately, because our embedded target does not have a functional BACKTRACE facility, I'm unable to include a backtrace, but hopefully the print-logs above are helpful enough.

I'll continue to poke at this some and if I can find a simpler test case, I'll add a follow-up note here.

@bunnie bunnie added the C-bug Category: This is a bug. label Jun 28, 2021
@jonas-schievink jonas-schievink added the O-riscv Target: RISC-V architecture label Jun 28, 2021
@xobs
Copy link
Contributor

xobs commented Jun 29, 2021

To add more information,

I've simplified the program to this:

#![cfg_attr(target_os = "none", no_std)]
#![cfg_attr(target_os = "none", no_main)]

#[derive(Copy, Clone)]
pub struct Scalar {
    pub bytes: [u8; 32],
}
impl Scalar {
    pub fn non_adjacent_form(&self, w: usize) -> [i8; 256] {
        let mut naf = [0i8; 256];

        let mut x_u64 = [0u64; 5];

        let x_u64_ptr_first = x_u64.as_ptr() as usize;
        println!("x_u64: {:x?}", x_u64);
        println!("&x_u64: 0x{:08x}", x_u64_ptr_first);
        println!(
            "Address is aligned? {} (remainder: {})",
            x_u64_ptr_first & 7 == 0,
            x_u64_ptr_first & 7
        );
        let x_u64_ptr_second = x_u64.as_ptr() as usize;
        assert_eq!(0, (x_u64.as_ptr() as usize) & 7);
        println!(
            "Passed assertion that 0 == {} {:08x} (or {} {:08x})",
            x_u64_ptr_first & 7,
            x_u64_ptr_first,
            x_u64_ptr_second & 7,
            x_u64_ptr_second
        );

        naf
    }
}

fn main() {
    let scalar = Scalar {
        bytes: [
            189, 59, 214, 8, 77, 86, 240, 50, 111, 170, 86, 37, 124, 154, 209, 79, 102, 72, 93, 53,
            130, 157, 102, 200, 60, 240, 215, 104, 246, 58, 214, 13,
        ],
    };
    let a_naf = scalar.non_adjacent_form(5);
}

When I run this on a desktop Rust, it works just fine. However, building for an embedded RISC-V target, running this program produces the following output:

x_u64: [0, 0, 0, 0, 0]
&x_u64: 0x8001ff04
Address is aligned? false (remainder: 4)
Passed assertion that 0 == 4 8001ff04 (or 0 8001ff04)

There is no unsafe code here, yet the pointer appears to be changing.

@xobs
Copy link
Contributor

xobs commented Jun 29, 2021

What should the alignment be of a u64 on RV32I? Should it be 32-bits or 64-bits? The LLVM IR seems to indicate it's 64 bits, but the actual pointer is only 32-bit aligned.

I think the strangest thing here is that x_u64_ptr_second == x_u64_ptr_first yet x_u64_ptr_second & 7 != x_u64_ptr_first & 7

@bunnie bunnie changed the title Problem with u64 right-shift (>>) on riscv32-imac (only) 64-bit Array Pointer Misalignment on riscv32i Jun 29, 2021
@bunnie
Copy link
Contributor Author

bunnie commented Jun 29, 2021

Updated the bug title to reflect the further root cause found. In summary:

  • our data layout is defined as "data-layout": "e-m:e-p:32:32-i64:64-n32-S128",
  • we can see from IR inside Godbolt that a 64-bit-aligned data structure is being requested:
  %6 = getelementptr inbounds [5 x i64], [5 x i64]* %x_u64, i32 0, i32 %u64_idx, !dbg !40
  %_30 = load i64, i64* %6, align 8, !dbg !40
  • however, test cases show that we are getting 64-bit data structures that are aligned on odd-word boundaries (last 3 bits of pointer == 4, not 0)
  • some further digging in assembly dumps show that attempts to access such 64-bit numbers on a 32-bit machines do things like this to access the upper 32 bits of a 64-bit number
        ori     a1, a3, 4
        lw      a4, 0(a1)
  • these optimizations break when 64-bit data is not aligned to 64 bits

Whether this is a bug in rustc or in llvm, I do not know, but hopefully this is a compact summary that can help those more knowledgeable than us to the correct maintainer to request a fix.

It also occurred to me that there's a possibility that maybe it's the responsibility, somehow, of the OS to intervene with stack alignments, but I can't think of a mechanism for guaranteeing that (should be at the linker/compiler level). But fwiw this is all running on our own home-rolled OS, but it's basically Rust on raw iron, and I can't think of a mechanism we could invoke to help guarantee such 64-bit alignments. Everything starts aligned to 4k-pages, anyways...

@xobs
Copy link
Contributor

xobs commented Jun 29, 2021

As mentioned by @bunnie , the root issue appears to be that u64 is not getting aligned like it should.

Here's a minimum reproduction:

pub fn alignment_check() {
    println!("Alignment of u64: {}", core::mem::align_of::<u64>());
    let single_u64 = 443u64;
    println!("Address of single_u64: {:08x}", &single_u64 as *const u64 as usize);
}

On riscv32imac-unknown-none-elf, this prints the following:

Alignment of u64: 8
Address of single_u64: 8001ff5c

So it should be aligned, and some optimizations are turned on assuming that it is aligned, however it isn't actually aligned.

@bunnie
Copy link
Contributor Author

bunnie commented Jun 29, 2021

Summarized as a haiku:

alignment is 8
u64 address: twelve
maintainers, please help!

@xobs
Copy link
Contributor

xobs commented Jun 29, 2021

Note that a workaround is to use a newtype. For example:

#[repr(align(32))]
struct AlignedU64Slice<const N: usize>([u64; N]);

Then you can refer to it using something like:

let my_slice = AlignedU64Slice([0u64; 5]);

I'm not clear why the align needs to be so large, but anything else causes it to not be properly aligned. It could be a misunderstanding of what align(S) means, and S is in units of bits and not bytes. At any rate, I've observed that #[repr(align(32))] remains aligned for any size array N, whereas any alignment smaller than 32 is not properly aligned to an 8-byte boundary on RISC-V.

bunnie added a commit to betrusted-io/curve25519-dalek that referenced this issue Jun 29, 2021
bunnie added a commit to betrusted-io/curve25519-dalek that referenced this issue Jun 29, 2021
For 32-bit targets on RISC-V, we have encountered a problem in the
non_adjacent_form() routine inside Scalar. To the best we can tell,
there is a compiler bug which is causing 64-bit types on 32-bit platforms
to not be properly aligned (see issue rust-lang/rust#86693).
This bug results in the lower 32 bits of a 64-bit number being replicated
into the top 32 bits, as a result of pointer arithmetic optimizations that
rely on correct alignment of the data types.

So far, we have only seen this problem inside one function that affects
ed25519 verifications, but it could be in other libraries we haven't
used yet.

Thanks to @xobs we have a work-around, which is to wrap the `[u64; 5]`
type inside an `AlignedU64Slice` `newtype` which is annotated with
a `#[repr(align(32))]`. In our tests this causes the ed25519 signatures
tests to pass on our platform.

Unfortunately, we are unable to get the bug to express on x86 using a u32
backend; the bug seems to be specific to RISC-V. However, I think this
patch is light-fingered enough that it might be worth considering
absorbing into the crate. Based on the age of similar bugs filed against
the Rust project, it may take some months or years even before this
gets properly addressed...
@xobs
Copy link
Contributor

xobs commented Jul 7, 2021

Update: the issue was due to $sp only being 4-byte aligned.

The current ABI assumes the stack is aligned to 16-bytes. Unfortunately on the target system, the stack was only 4-byte aligned. As a result, some optimizations were triggered that were causing this unfortunate miscompilation.

Aligning $sp solves the issue.

bunnie added a commit to betrusted-io/curve25519-dalek that referenced this issue Jul 11, 2021
turns out we just needed to align $sp to 16-byte boundaries

This reverts commit 279bbc5.
@bunnie
Copy link
Contributor Author

bunnie commented Jul 11, 2021

Yep, looks like the issue is due to a mis-alignment of $sp. Even for 32 bit systems, the stack alignment must be 64 bits:

https://github.com/riscv/riscv-eabi-spec/blob/master/EABI.adoc#4-eabi-stack-alignment

Thanks for tracing this down @xobs! I'll close this for now, seems not to be an issue with Rust/llvm and actually an issue with our runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-riscv Target: RISC-V architecture
Projects
None yet
Development

No branches or pull requests

3 participants