64-bit Array Pointer Misalignment on riscv32i

I think this is a bug, which is exhibited in a riscv32imac, but not an x86 target.

I tried this code (see it on [goldbolt](https://godbolt.org/z/K8MK1v6f9) / [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=66f078fdf69a3cf2a578261740326f09) ); the version below has the `log` statements in there so you can line up the code against log data I provide later in this bug:

```rust

#[derive(Copy, Clone, Hash)]
pub struct Scalar {
    pub bytes: [u8; 32],
}
impl Scalar {
    pub fn non_adjacent_form(&self, w: usize) -> [i8; 256] {

        let mut naf = [0i8; 256];

        let mut x_u64 = [0u64; 5];
        LittleEndian::read_u64_into(&self.bytes, &mut x_u64[0..4]);

        log::info!("x_u64: {:x?}", x_u64);

        let width = 1 << w;
        let window_mask = width - 1;

        let mut pos = 0;
        let mut carry = 0;
        while pos < 256 {
            // Construct a buffer of bits of the scalar, starting at bit `pos`
            let u64_idx = pos / 64;
            let bit_idx = pos % 64;
            let bit_buf: u64;
            if bit_idx < 64 - w {
                // This window's bits are contained in a single u64
                bit_buf = x_u64[u64_idx] >> bit_idx;   ///////////////////////////////////// <------- this does not work right
                log::info!("bit_buf: {:x}", bit_buf);
            } else {
                // Combine the current u64's bits with the bits from the next u64
                bit_buf = (x_u64[u64_idx] >> bit_idx) | (x_u64[1+u64_idx] << (64 - bit_idx));
            }

            let window = carry + (bit_buf & window_mask);

            if window & 1 == 0 {
                pos += 1;
                continue;
            }

            if window < width/2 {
                log::info!("carry 0 width {} naf[{}] = {}; c.{} bb.{:x} wm.{} idx64.{} idxbit.{} xu64[0].{:x}", width, pos, window,
                    carry, bit_buf, window_mask, u64_idx, bit_idx, x_u64[0],
                );
                carry = 0;
                naf[pos] = window as i8;
            } else {
                log::info!("carry 1 width {} naf[{}] = {}/{}; c.{} bb.{:x} wm.{} idx64.{} idxbit.{} xu64[0].{:x}", width, pos, window, (window as i8).wrapping_sub(width as i8),
                    carry, bit_buf, window_mask, u64_idx, bit_idx, x_u64[0]
                );
                carry = 1;
                naf[pos] = (window as i8).wrapping_sub(width as i8);
            }

            pos += w;
        }

        naf
    }
}
```

I expected to see this happen: 

This is a snippet from the `curve25519-dalek` crate's `scalar.rs` module, which is used in `ed25519` signatures. Fortunately, there are well-known test vectors for curve25519, so I'm able to give you expected inputs and outputs.

```Rust
    let scalar = Scalar {
        bytes: [189, 59, 214, 8, 77, 86, 240, 50, 111, 170, 86, 37, 124, 154, 209, 79, 102, 72, 93, 53, 130, 157, 102, 200, 60, 240, 215, 104, 246, 58, 214, 13],
    };
    let a_naf = scalar.non_adjacent_form(5);

    let expected_result = [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0];

    assert!(expected_result ==  a_naf, "something went wrong!");
```
Instead, this happened: 
- On x86, this code produces the expected result.
- On a RISCV32-IMAC target (seen both in renode emulation and on live hardware), we get this (with emphasis on the first salient evidence of an error):

```
                          vvvvvvvvvvvvvv
INFO:test_stub: x_u64: [32f0564d08d63bbd, 4fd19a7c2556aa6f, c8669d82355d4866, dd63af668d7f03c, 0] (services\test-stub\src\main.rs:55)
INFO:ticktimer_server: Server started with SID SID([1801677172, 1701669236, 1702047090, 1919252082]) (services\ticktimer-server\src\main.rs:503)
INFO:test_stub: bit_buf: 8d63bbd08d63bbd (services\test-stub\src\main.rs:78)
                         ^^^^^^^^^^^^^^^
INFO:test_stub: carry 1 width 32 naf[0] = 29/-3; c.0 bb.8d63bbd08d63bbd wm.31 idx64.0 idxbit.0 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:103)
INFO:test_stub: bit_buf: 46b1dde846b1dd (services\test-stub\src\main.rs:78)
INFO:test_stub: bit_buf: 2358eef42358ee (services\test-stub\src\main.rs:78)
INFO:test_stub: carry 0 width 32 naf[6] = 15; c.1 bb.2358eef42358ee wm.31 idx64.0 idxbit.6 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:97)
INFO:test_stub: bit_buf: 11ac777a11ac7 (services\test-stub\src\main.rs:78)
INFO:test_stub: carry 0 width 32 naf[11] = 7; c.0 bb.11ac777a11ac7 wm.31 idx64.0 idxbit.11 xu64[0].32f0564d08d63bbd (services\test-stub\src\main.rs:97)

.... much spew ....

INFO:test_stub: a_naf: [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, ***-3***, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 15, 0, 0, 0, 0, -13, 0, 0, 0, 0, 11, 0, 0, 0, 0, 13, 0, 0, 0, 0, -11, 0, 0, 0, 0, -13, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, -15, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, -3, 0, 0, 0, 0, 11, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, 3, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, 3, 0, 0] (services\test-stub\src\main.rs:131)
INFO:test_stub: expected_result: [-3, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 11, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, ***13***, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, -11, 0, 0, 0, 0, -5, 0, 0, 0, 0, 11, 0, 0, 0, 0, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 15, 0, 0, 0, 0, -11, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -11, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, -5, 0, 0, 0, 0, 9, 0, 0, 0, 0, -13, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, -7, 0, 0, 0, 0, -5, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0] (services\test-stub\src\main.rs:132)

(emphasis mine on the first mismatching index from the result)
```

Basically, on the first iteration of the loop, `bit_buf` should be loaded with the value of `x_u64[0] >> bit_idx`, and because `bit_idx` is 0, `bit_buf` should be exactly `x_u64[0]`.

Instead, I find that the value of `x_u64[0]` is 32f0564d_08d63bbd, and the value of `bit_buf` is 8d63bbd_08d63bbd. The key symptom is that instead of representing a whole 64-bit value, `bit_buf` seems to be just the lower 32 bits of the 64-bit value, replicated up to the upper 32 bits. 

This problem continues in later iterations of the loop, with the upper 32 bits taking a copy of the lower 32 bits. 


### Meta
`rustc --version --verbose`:
```
rustc --version --verbose
rustc 1.52.1 (9bc8c42bb 2021-05-09)
binary: rustc
commit-hash: 9bc8c42bb2f19e745a63f3445f1ac248fb015e53
commit-date: 2021-05-09
host: x86_64-pc-windows-msvc
release: 1.52.1
LLVM version: 12.0.0
```
I feel like there should be some mention of the RISCV target toolchain version as well, but the bug template does not include an instruction on how to extract that. To wit, I was able to reproduce the bug using a Linux-based toolchain running Renode, as well as using a Windows-based toolchain running on live hardware (using the VexRISCV implementation on an FPGA).

Unfortunately, because our embedded target does not have a functional BACKTRACE facility, I'm unable to include a backtrace, but hopefully the print-logs above are helpful enough.

I'll continue to poke at this some and if I can find a simpler test case, I'll add a follow-up note here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

64-bit Array Pointer Misalignment on riscv32i #86693

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

64-bit Array Pointer Misalignment on riscv32i #86693

Description

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions