Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Relax the canonical order checking of arch string for -march option #11

Open
kito-cheng opened this issue May 27, 2021 · 17 comments

Comments

@kito-cheng
Copy link
Collaborator

kito-cheng commented May 27, 2021

Changes

  • 2021/06/03
    • Add an new option d.

The ISA spec has specify the ISA extension could be describe the arch string, and toolchain are using this way to control the code gen, and currently toolchain are following the order describe by the ISA spec, and emit error if order is wrong.

However we have bunch of sub-extension from several ISA extension now, zv*, zb*, zk*, zp*, zf* and zi*, and there is x*, s* and zxm* extensions, the order rule is also little bit different:

  • Order between z*, s*, h*, zxm* and x* are: z* -> s* -> h* -> zxm* -> x*
  • Order rule for z*: they should be ordered first by category (second letter), then alphabetically within a category
  • Order rule for h*, s*, zxm* and x*: ordered alphabetically.

Okay, so how to describe a arch has rv64gc, sfoo, zba, zfh, zicsr, zxmbar, xsf and xabc?
The canonical order is there: rv64gc_zicsr_zfh_zba_sfoo_zxmbar_xabc_xsf.
I think it's hard to write it correct without check with ISA spec.

So my proposal is relax the canonical order checking for the -march option,

And I list few possible options:

a. Relax order check for multi-letter extension, e.g. order between z*, s*, h*, zxm* and x*. [*1]
b. Relax order check for whole arch string, including single letter extensions. [*1]

  • e.g. RV32MAFDI is OK

c. Don't relax order check, but emit an error to tell user the right order.
d. Relax order check for whole arch string, including single letter extensions. [*1] except I, G, E.

  • e.g.
    • RV32IMAFD == RV32IAFDM
    • RV32MAFDI is not OK.
    • RV32I_ZAM # OK
    • RV32IM_ZA #OK
    • RV32I_ZA_M #OK
    • RV32IZAM # Not OK multi-letter must start with _

*1: The ELF attribute emission still must in canonical order to simplify the parser logic, that's also keep the possibility of cache the checking result if we check that in OS or runtime in future.

@kito-cheng kito-cheng changed the title Relax the canonical order of arch string [RFC] Relax the canonical order of arch string May 27, 2021
@kito-cheng kito-cheng changed the title [RFC] Relax the canonical order of arch string [RFC] Relax the canonical order checking of arch string for -march option May 27, 2021
@luismarques
Copy link

Thanks for writing down this proposal Kito. I like the idea that tools that consume human-generated input would be flexible in the order they would accept, but that they would always canonicalize the ISA string. So when a tool prints something it's always in canonical order and you can easily match that string. But why should the human have to remember that order? One possible exception that was mentioned in the LLVM RISC-V sync-up call was requiring that the i would always come first, as that's very integral to the rv32i vs rv64i distinction, but maybe that's overthinking it. (The same logic would presumably apply to g).

@jim-wilson
Copy link
Collaborator

I think the main problem is with the z and x extensions, which have unobvious orders. The single letter extensions are pretty clear about the right ordering. I'm Ok with relaxing the order check for user input for the multicharacter extensions.

@pz9115
Copy link

pz9115 commented Jun 3, 2021

Nice idea. I often met this problem in develop and test. Since we usually use 'imafdc' in general, it's not easily always type other extensions in right sequence. Relaxing the order that not frequently is real helpful for both user and developer.

@ebahapo
Copy link

ebahapo commented Jun 3, 2021

Option D. As it was mentioned in the GCC call, the interpretation of the string should always be after the canonical order. If anything, to future proof it. When displaying the string, wherever in the toolchain, it should always be in the canonical order, whatever the user input. Also, the shorthand extension G should preferably be displayed, instead of spelling out its components.

@kenta2
Copy link

kenta2 commented Jun 3, 2021

I support option c (Don't relax order check, but emit an error to tell user the right order). It's not just the GNU toolchain (and LLVM) that is emitting and parsing ISA strings. I've written helper scripts that help users find the right ISA string for the features they have or want, and having a specification that says There Is Only One Correct Answer makes things easier to do such tasks.

@tclin914
Copy link
Contributor

tclin914 commented Jun 4, 2021

Option a. Does it have another option that just relaxes the order in the same multi-letter extension?

@rjiejie
Copy link

rjiejie commented Jun 7, 2021

Option b or d both are better, it let user easy to use this 'march' option without any complex detail of SPEC
and tools like GCC or LLVM is responsible for hiding the details inside.

@Geng-Qi-alibaba
Copy link

I prefer option d. It's is flexible and reads clearly.

@jrtc27
Copy link

jrtc27 commented Jun 7, 2021

For: Dynamically building up arch strings is awkward (e.g. you can't take rv64gc and then append q, because the order is rv64gqc) if you're taking in user input and appending

Against: Comparing arch strings visually becomes much harder if they're not in canonical order

@kito-cheng
Copy link
Collaborator Author

kito-cheng commented Jun 15, 2021

Thanks all your feedback, sound like most people are happy to relax the canonical order checking, statistics so far:

Option Voting
a @jim-wilson, @tclin914 @kito-cheng
b @rjiejie @pz9115
c @kenta2
d @luismarques, @ebahapo @rjiejie @Geng-Qi @pz9115

@kito-cheng
Copy link
Collaborator Author

@palmer-dabbelt has told me he is fine to relax the order, but we should take care about the ambiguous of interpreting the arch string, especially for the multi-letter and single-letter extension.

e.g.

rv32i_a_zm # i + zm +a, no ambiguous.
rv32i_zma # i + zm +a or i + zma?

So I think we should write down few more explicit rule, like multi-letter extension must using underline between all adjacent extension.

@nick-knight
Copy link

@kito-cheng proposes "option (d)":

RV32I_ZAM # OK
RV32IZAM # Not OK multi-letter must start with _

I am against adding constraints that aren't in the (official) ISA string specification (Vol I Ch. 25). In this case, RV32IZam and RV32I_Zam are both legal, so mandating the latter is problematic in my opinion. So I'm strongly opposed to option (d).

I am (mildly) in favor of option (c): do not relax the specification. As a toolchain user I have never found the law to be particularly onerous. My vote is motivated by concern of a (hypothetical) situation where a user becomes accustomed to writing illegal ISA strings, switches to a tool which enforces the law, and then files a (spurious) bug report, wasting some law-abiding developer's time. (Spare the rod, spoil the child?) This is a subjective argument, and certainly not a hill I will die on.

@kito-cheng
Copy link
Collaborator Author

@nick-knight I think the details could be discussed, since the intention is preventing paring ambiguous.

My vote is motivated by concern of a (hypothetical) situation where a user becomes accustomed to writing illegal ISA strings, switches to a tool which enforces the law, and then files a (spurious) bug report, wasting some law-abiding developer's time. (Spare the rod, spoil the child?) This is a subjective argument, and certainly not a hill I will die on.

That sounds a valid concern, when we relax that, then people are used to using relaxed order, but I still concern about the order for multi-letter extension - it's really hard to sort, so I change my mind (back) to option a.

@kito-cheng
Copy link
Collaborator Author

Hey guys, thank all your feedback, now I created a PR #14 for this RFC, welcome any further comment on that PR :)

@cmuellner
Copy link
Collaborator

In today's GCC RISC-V call we discovered the concern that a future ISA string format might cause a conflict. Addressing that in the specification (e.g. by relaxing the order there, or by getting hard guarantees there which can be relied upon) seemed unlikely.

One way that would still be possible is to do something similar like the push_arch proposal in the RISC-V Assembly Programmer's Manual E.g. -march=rv64gc -march-append=zicsr.

@Nelson1225
Copy link

Nelson1225 commented Oct 25, 2021

Hey guys, thank all your feedback, now I created a PR #14 for this RFC, welcome any further comment on that PR :)

I agree with @palmer-dabbelt and @nick-knight that the option d will cause more ambiguous problems. It doesn't worth that spending times to resolve these ambiguous problems caused by option d. The PR #14 seems like the option b, but add a restriction that the multi-letter extension must come after single-letter extension. I like this idea, and it should be convenient enough to users, and also easy to implement without changing the current ISA spec. If there is no other concern, I would like to update the binutils according to the PR #14 recently.

@fanghuaqi
Copy link
Contributor

In today's GCC RISC-V call we discovered the concern that a future ISA string format might cause a conflict. Addressing that in the specification (e.g. by relaxing the order there, or by getting hard guarantees there which can be relied upon) seemed unlikely.

One way that would still be possible is to do something similar like the push_arch proposal in the RISC-V Assembly Programmer's Manual E.g. -march=rv64gc -march-append=zicsr.

I think this is a good way to specify march string, if we keep adding new risc-v extension, this march string will be too long, and a little ugly to me, but considering library built for different arch, it will also be a disaster.

cmuellner added a commit to cmuellner/riscv-toolchain-conventions that referenced this issue Nov 8, 2022
This is essentially a collection of all recent
change requests for the -march string:

* Relax the ISA string order ([1])
* Add custom extensions ([2])
* Add profiles support ([3])

Most of this patch is based on proposals from Kito Cheng <kito.cheng@gmail.com>
(see linked resources below).

[1] riscv-non-isa#14
[2] riscv-non-isa#1
[3] https://lists.riscv.org/g/sig-toolchains/message/379
[4] riscv-non-isa#11

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
cmuellner added a commit to cmuellner/riscv-toolchain-conventions that referenced this issue Nov 8, 2022
This is essentially a collection of all recent
change requests for the -march string:

* Relax the ISA string order ([1])
* Add custom extensions ([2])
* Add profiles support ([3])

Most of this patch is based on proposals from Kito Cheng <kito.cheng@gmail.com>
(see linked resources below).

[1] riscv-non-isa#14
[2] riscv-non-isa#1
[3] https://lists.riscv.org/g/sig-toolchains/message/379
[4] riscv-non-isa#11

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
cmuellner added a commit to cmuellner/riscv-toolchain-conventions that referenced this issue Dec 1, 2022
This is essentially a collection of all recent
change requests for the -march string:

* Relax the ISA string order ([1])
* Add custom extensions ([2])
* Add profiles support ([3])

Most of this patch is based on proposals from Kito Cheng <kito.cheng@gmail.com>
(see linked resources below).

[1] riscv-non-isa#14
[2] riscv-non-isa#1
[3] https://lists.riscv.org/g/sig-toolchains/message/379
[4] riscv-non-isa#11

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
cmuellner added a commit to cmuellner/riscv-toolchain-conventions that referenced this issue Dec 1, 2022
This is essentially a collection of all recent
change requests for the -march string:

* Relax the ISA string order ([1])
* Add custom extensions ([2])
* Add profiles support ([3])

Most of this patch is based on proposals from Kito Cheng <kito.cheng@gmail.com>
(see linked resources below).

[1] riscv-non-isa#14
[2] riscv-non-isa#1
[3] https://lists.riscv.org/g/sig-toolchains/message/379
[4] riscv-non-isa#11

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
cmuellner added a commit to cmuellner/riscv-toolchain-conventions that referenced this issue Dec 1, 2022
This is essentially a collection of all recent
change requests for the -march string:

* Relax the ISA string order ([1])
* Add custom extensions ([2])
* Add profiles support ([3])

Most of this patch is based on proposals from Kito Cheng <kito.cheng@gmail.com>
(see linked resources below).

[1] riscv-non-isa#14
[2] riscv-non-isa#1
[3] https://lists.riscv.org/g/sig-toolchains/message/379
[4] riscv-non-isa#11

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
wangliu-iscas pushed a commit to plctlab/patchwork-binutils-gdb that referenced this issue Dec 21, 2022
* riscv-toolchain-conventions,
PR, riscv-non-isa/riscv-toolchain-conventions#14
Issue, riscv-non-isa/riscv-toolchain-conventions#11

* Refer to the commit afc41ff,
RISC-V: Reorder the prefixed extensions which are out of order.

In the past we only allow to reorder the prefixed extensions.  But according
to the PR 14 in the riscv-toolchain-convention, we can also relax the order
checking to allow the whole extensions be written out of orders, including
the single standard extensions and the prefixed multi-letter extensions.
Just that we still need to follow the following rules as usual,

1. prefixed extensions need to be seperated with `_'.
2. prefixed extensions need complete <major>.<minor> version if set.

Please see the details in the march-ok-reorder gas testcase.

Passed the riscv-gnu-toolchain regressions.

bfd/
    * elfxx-riscv.c (enum riscv_prefix_ext_class): Changed RV_ISA_CLASS_UNKNOWN
    to RV_ISA_CLASS_SINGLE, since everything that does not belong to the
    multi-keyword will possible be a single extension for the current parser.
    (parse_config): Likewise.
    (riscv_get_prefix_class): Likewise.
    (riscv_compare_subsets): Likewise.
    (riscv_parse_std_ext): Removed, and merged with riscv_parse_prefixed_ext
    into riscv_parse_extensions.
    (riscv_parse_prefixed_ext): Likewise.
    (riscv_parse_subset): Only need to call riscv_parse_extensions to parse
    both single standard and prefixed extensions.
gas/
    * testsuite/gas/riscv/march-fail-order-std.d: Removed since the relaxed
    order checking.
    * testsuite/gas/riscv/march-fail-order-std.l: Likewise.
    * testsuite/gas/riscv/march-fail-order-x-std.d: Likewise.
    * testsuite/gas/riscv/march-fail-order-z-std.d: Likewise.
    * testsuite/gas/riscv/march-fail-order-zx-std.l: Likewise.
    * testsuite/gas/riscv/march-fail-unknown-std.l: Updated.
    * testsuite/gas/riscv/march-ok-reorder.d: New testcase.
saagarjha pushed a commit to ahjragaas/binutils-gdb that referenced this issue Dec 23, 2022
* riscv-toolchain-conventions,
PR, riscv-non-isa/riscv-toolchain-conventions#14
Issue, riscv-non-isa/riscv-toolchain-conventions#11

* Refer to the commit afc41ff,
RISC-V: Reorder the prefixed extensions which are out of order.

In the past we only allow to reorder the prefixed extensions.  But according
to the PR 14 in the riscv-toolchain-convention, we can also relax the order
checking to allow the whole extensions be written out of orders, including
the single standard extensions and the prefixed multi-letter extensions.
Just that we still need to follow the following rules as usual,

1. prefixed extensions need to be seperated with `_'.
2. prefixed extensions need complete <major>.<minor> version if set.

Please see the details in the march-ok-reorder gas testcase.

Passed the riscv-gnu-toolchain regressions.

bfd/
    * elfxx-riscv.c (enum riscv_prefix_ext_class): Changed RV_ISA_CLASS_UNKNOWN
    to RV_ISA_CLASS_SINGLE, since everything that does not belong to the
    multi-keyword will possible be a single extension for the current parser.
    (parse_config): Likewise.
    (riscv_get_prefix_class): Likewise.
    (riscv_compare_subsets): Likewise.
    (riscv_parse_std_ext): Removed, and merged with riscv_parse_prefixed_ext
    into riscv_parse_extensions.
    (riscv_parse_prefixed_ext): Likewise.
    (riscv_parse_subset): Only need to call riscv_parse_extensions to parse
    both single standard and prefixed extensions.
gas/
    * testsuite/gas/riscv/march-fail-order-std.d: Removed since the relaxed
    order checking.
    * testsuite/gas/riscv/march-fail-order-std.l: Likewise.
    * testsuite/gas/riscv/march-fail-order-x-std.d: Likewise.
    * testsuite/gas/riscv/march-fail-order-z-std.d: Likewise.
    * testsuite/gas/riscv/march-fail-order-zx-std.l: Likewise.
    * testsuite/gas/riscv/march-fail-unknown-std.l: Updated.
    * testsuite/gas/riscv/march-ok-reorder.d: New testcase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests