Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: P extension intrinsics for packed SIMD (part 1) #1332

Merged
merged 2 commits into from
Sep 13, 2022

Conversation

luojia65
Copy link
Contributor

@luojia65 luojia65 commented Sep 9, 2022

This pull reqeust includes intrinsics for RISC-V Packed SIMD extension. It includes:

  • Section 3.1, SIMD data processing instructions
  • Section 3.2, Partial-SIMD data processing instructions
  • Section 3.4, Misc non-SIMD instructions

Following instructions require work in further pull requests:

  • Section 3.3, 64-bit data computation instructions
  • Section 3.5, RV64 only and RV32 only instructions

I implemented all intrinsic functions from P extension specification, this extension is useful as some RISC-V chips have already taped out with P extension. All intrinsics are implemented as inline assembly, for LLVM does not have these intrinsic functions by now; we can change to LLVM functions if it appears in future. This would be the first pull request as further works would be introduced in other pull requests.

Questions remain before further pull requests are:

  • Where should I put RV32 only intrinsics? There are modules named riscv_shared and riscv64 by now.
  • How to get returned 64-bit value from inline assembly? Some of the RV32 instructions would return 64-bit calculation result.

Please review and give suggestions if modifications are needed, thanks!

r? @Amanieu

@Amanieu
Copy link
Member

Amanieu commented Sep 9, 2022

Some notes about the asm!:

  • Use lateout to allow re-using the same register as an input.
  • Use the pure option for instructions with no side effects (in addition to nomem).

Where should I put RV32 only intrinsics? There are modules named riscv_shared and riscv64 by now.

Put them in a riscv32 module in core_arch. At the moment the top-level riscv32 and riscv64 module re-export the modules in core_arch, see https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/mod.rs#L62.

How to get returned 64-bit value from inline assembly? Some of the RV32 instructions would return 64-bit calculation result.

Are these values produced in 2 registers? Define both registers as 32-bit outputs and the re-construct the 64-bit value in Rust code.

@thomcc
Copy link
Member

thomcc commented Sep 9, 2022

Use lateout to allow re-using the same register as an input.

We should probably be careful about this until llvm/llvm-project#57550 is fixed. I think it should be fine to use, so long as its not used twice in the same asm block though.

@Amanieu
Copy link
Member

Amanieu commented Sep 9, 2022

I had a look at the spec and it seems some RV32 instructions work with register pairs. This is not currently supported by asm! and will require changes to the compiler. Alternatively you can work around this for now by using fixed registers for inputs & outputs.

@luojia65
Copy link
Contributor Author

luojia65 commented Sep 9, 2022

Thanks! I'll apply your suggestions about asm! macro. For RV32 instruction work with register pairs, we can firstly use workarounds, mark it as todo and use better solutions when there is a compiler support in the future.

@luojia65
Copy link
Contributor Author

luojia65 commented Sep 9, 2022

Hello! I applied following suggestions:

  • use options(pure, nomem, nostack) for all arithmetic packed-simd instructions
  • use inlateout when it requires the same register as both input and output (if this doesn't work, we can merge other parts of this pull request first)

I squashed and force pushed one commit for this pull request.

r? @Amanieu

@luojia65 luojia65 force-pushed the riscv-packed-simd branch 2 times, most recently from a7cc2c2 to e6bc261 Compare September 9, 2022 08:03
pub fn add16(a: usize, b: usize) -> usize {
let value: usize;
unsafe {
asm!(".insn r 0x77, 0x0, 0x20, {}, {}, {}", out(reg) value, in(reg) a, in(reg) b, options(pure, nomem, nostack))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
asm!(".insn r 0x77, 0x0, 0x20, {}, {}, {}", out(reg) value, in(reg) a, in(reg) b, options(pure, nomem, nostack))
asm!(".insn r 0x77, 0x0, 0x20, {}, {}, {}", lateout(reg) value, in(reg) a, in(reg) b, options(pure, nomem, nostack))

This needs to be lateout so the register allocator can reuse the register used for a or b for the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Should all the instructions here use 'lateout' as well? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unless there is a restriction in the ISA that prevents reusing an input register for the output. I don't think that is the case here (I know a few ARM instructions have this restriction).

Implement by inline assembly for now, uses `pure, nomem, nostack` for
all packed simd arithmetic instructions. Uses `inlateout` when it
requires using the same register for input and output, use `lateout`
for all output registers.

This commit also includes a rearrangement of shared risc-v architecture
module to improve documents. It also includes a doc test fix, gate sm3/4
and use explict sm3/4 instruction under rustc target feature.
@Amanieu Amanieu merged commit b9cf2d7 into rust-lang:master Sep 13, 2022
@luojia65 luojia65 deleted the riscv-packed-simd branch September 13, 2022 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants