Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Extend .option directive for control enabled extensions on specific code region #67

Merged
merged 1 commit into from
Mar 14, 2023

Conversation

kito-cheng
Copy link
Collaborator

@kito-cheng kito-cheng commented Jul 9, 2021

Changes:

  • 2021/8/12:
    • Add , after arch.
    • Only allow enable/disable one arch one time.
    • Add = operator to set arch to specific configuration.
  • 2021/7/29: Extend .option rather than .push_arch/.pop_arch.

This commit extend .option diretive with new keyword:

  • .option arch

Enable and/or disable specific ISA extensions for following code regions, but
without changing the arch attribute, that means it won't raise the minimal
execution environment requirement, so the user should take care to protect
execution of the code regions around .option push/.option arch/.option pop.
A typical use case is is with ifunc, e.g. the libc is built with rv64gc,
but a few functions like memcpy provide two versions, one built with rv64gc
and one built with rv64gcv, and then select between them by ifunc mechanism
at run-time. However we don't want to change the minimal execution environment
requirement to rv64gcv, since the rv64gcv version will be invoked only if
the execution environment supports the vector extension, so the minimal
execution environment requirement still is rv64gc.

Example:

.attribute arch, "rv64imafdc"
 # You can only use instruction from i, m, a, f, d and c extensions.
memcpy_general:
    add     a5,a1,a2
    beq     a1,a5,.L2
    add     a2,a0,a2
    mv      a5,a0
.L3:
    addi    a1,a1,1
    addi    a5,a5,1
    lbu     a4,-1(a1)
    sb      a4,-1(a5)
    bne     a5,a2,.L3
.L2:
    ret 

.option push    # Push current option to stack.
.option arch +v # Enable vector extension, we can use any instruction in imafdcv extension.
memcpy_vec:
    mv a3, a0
.Lloop:
    vsetvli t0, a2, e8, m8, ta, ma
    vle8.v v0, (a1)
    add a1, a1, t0
    sub a2, a2, t0
    vse8.v v0, (a3)
    add a3, a3, t0
    bnez a2, .Lloop
    ret
.option pop   # Pop current option from stack, restore the enabled ISA extension status to imafdc.

Notes:

Copy link
Collaborator

@jim-wilson jim-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK with some English improvements. Only useful if both clang and GNU as support it at roughly the same time, and will require configure checks in GNU tools before using, and whatever the clang equivalent is.

riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
@jrtc27
Copy link
Contributor

jrtc27 commented Jul 22, 2021

Why is this a new directive rather than just a .option? We already have .option push/.option pop, all you need is a .option to change the arch string like now we already have rvc/norvc. Perhaps that should just be generalised to (no)rv$arch?

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 22, 2021

i.e.:

.attribute arch, "rv64imafdc"
memcpy_general:
    add     a5,a1,a2
    beq     a1,a5,.L2
    add     a2,a0,a2
    mv      a5,a0
.L3:
    addi    a1,a1,1
    addi    a5,a5,1
    lbu     a4,-1(a1)
    sb      a4,-1(a5)
    bne     a5,a2,.L3
.L2:
    ret

.option push
.option rvv
memcpy_vec:
    mv a3, a0
.Lloop:
    vsetvli t0, a2, e8, m8, ta, ma
    vle8.v v0, (a1)
    add a1, a1, t0
    sub a2, a2, t0
    vse8.v v0, (a3)
    add a3, a3, t0
    bnez a2, .Lloop
    ret
.option pop    # Restore the enabled ISA extension status to imafdc.

This is cleaner, more consistent and avoids adding more than one way to enable/disable RVC.

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 22, 2021

Looks OK with some English improvements. Only useful if both clang and GNU as support it at roughly the same time, and will require configure checks in GNU tools before using, and whatever the clang equivalent is.

The assembler is integrated (well, it doesn't even bother to serialise and re-parse textual assembly) so normally no such concern exists. When there is enough interest in supporting old GNU assemblers with the integrated assembler turned off there's -fbinutils-version that has to be supplied by the user. But configure-time does not make sense for Clang as there's very little distinction between native and cross-compilation beyond the fact it picks a default triple, and every build contains all backends by default, and you're not going to have an assembler for every possible supported target present when you build Clang.

@kito-cheng
Copy link
Collaborator Author

Why is this a new directive rather than just a .option? We already have .option push/.option pop, all you need is a .option to change the arch string like now we already have rvc/norvc. Perhaps that should just be generalised to (no)rv$arch?

Reason why define new directive rather than extend rvc/norvc scheme.

  • Able to included the version info in the option.
  • Able to enable more than one extension at a time.
  • Implementation is simpler, it won't mixed with existing .option implementation.

Here is an another issue there: should we have negative option for .push_arch/.pop_arch?
e.g. .push_arch -c, .push_arch -v

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 23, 2021

Why is this a new directive rather than just a .option? We already have .option push/.option pop, all you need is a .option to change the arch string like now we already have rvc/norvc. Perhaps that should just be generalised to (no)rv$arch?

Reason why define new directive rather than extend rvc/norvc scheme.

  • Able to included the version info in the option.

.option rvv1p0

  • Able to enable more than one extension at a time.

.option rvv1p0, rvb0p93, or I guess you can .option rvb0p93_v1p0 etc and allow an arch "substring", both work. But even without those options, having to have multiple lines is hardly a big deal... how often do you need to enable more than one extension? And how often is that being done by hand rather than just auto-generated by a loop in the compiler? It's a non-issue.

  • Implementation is simpler, it won't mixed with existing .option implementation.

Disagree. Uniformity is simpler. Parsing this is trivial and generalises the code that's already there. Having two ways to do the same thing confuses users, looks ugly and means you have multiple copies of similar code in your assembler.

Here is an another issue there: should we have negative option for .push_arch/.pop_arch?
e.g. .push_arch -c, .push_arch -v

Yes. My solution supports that.

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 23, 2021

Other than aesthetics and arguments about simplicity/consistency, there is a very real concern I have about having two different push/pop mechanisms. What does:

.attribute arch, rv32i2p0
.push_arch m
.option rvc
.pop_arch

mean? Is RVC enabled or disabled at the end of that? Or:

.attribute arch, rv32i2p0
.option push
.push_arch m
.option pop

Does that have RVM enabled or disabled at the end? If you make the push and pop parts of .push_arch and .pop_arch be the same as .option then they are completely redundant and should not exist beyond the "set the arch string" part (i.e. pop_arch goes away and push_arch only modifies the current arch string), but then push_arch is a bad name for that part. If you don't make them equivalent and try and maintain two separate stacks then that becomes a complete tangled nightmare both in the implementation and in the conceptual model. Thus, whatever you name the thing, it needs to ultimately be equivalent to .option (no)rv$arch, and so might as well be called that for consistency with (no)rvc rather than some kind of .modify_arch.

@kito-cheng
Copy link
Collaborator Author

@jrtc27 thanks your example, that's good point, let me write down second version for this proposal.

@Nelson1225
Copy link

Nelson1225 commented Jul 26, 2021

I already have the first version of .push_arch and .pop_arch in GNU binutils, but haven't written the test cases yet.
Nelson1225/riscv-binutils-gdb@8e2483f

The implementation and assembly syntax of .option rvc; .option norvc; and .push_arch c2p0; .pop_arch are different, though they all need to be stored in the stack. I agree with @jrtc27 that we should use the same format and way to do this stuff, so seems that there are some issues as follows,

  1. Extend .option rvc that can have versions. for example,
.attribute arch, rv32i2p0
.option push  /* backup rv32i2p0.  */
.option rvc2p0  /* added c2p0, rv32i2p0_c2p0.  */
.option rvc3p0  /* updated c3p0, rv32i2p0_c3p0.  */
.option norvc  /* removed c, rv32i2p0.  */
.option pop  /* restored rv32i2p0.  */

So the format .option rv<extension><version> is good, but .option norv<extension><version> will complicate the usage, so I suggest do not use versions in the .option norv<extension>

  1. Not only the standard extensions can be pushed and popped, so according to the above formats.
.option rv<extension><version> and .option norv<extension>

.attribute arch, rv32i2p0_m2p0
.option push  /* backup rv32i2p0.  */
.option rvc2p0  /* added c2p0, rv32i2p0_c2p0.  */
.option rvzfh0p1  /* added zfh0p1, rv32i2p0_c2p0_zfh0p1.  */
.option rvxvendor1p0  /* added xvendor1p0, rv32i2p0_c2p0_zfh0p1_xvendor1p0.  */
.option pop  /* restored rv32i2p0.  */

Although rv prefix looks redundant, but we need to consider the compatibility of rvc and norvc.

  1. Should we still support the following format?
.option rv<extension1><versionN>, rv<extension2><versionM>, ...
.option norv<extension1>, norv<extension2>, ...

@kito-cheng kito-cheng changed the title Proposal: New directive for control enabled extensions on specific code region Proposal: Extend .option directive for control enabled extensions on specific code region Jul 29, 2021
@kito-cheng
Copy link
Collaborator Author

Changes:

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 29, 2021

Why is this still introducing a new way of doing things, .option arch <foo>, rather than reusing and extending the existing syntax? This still results in two different ways to enable/disable RVC. We should not be adding new syntax unless absolutely necessary.

@palmer-dabbelt
Copy link
Contributor

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

  • Add an = op (or just let it be blank). This would let users write something like .option arch rv64gc when they want to control exactly the arch they're based on.
  • Drop the extension list, and just make users insert multiple .option arch lines. This prevents us from getting screwed when our separator (IIUC a ,?) becomes a valid character in extensions.
  • Add this as command-line syntax. Something like -march=rv64g -march+c. We can deal with that later, though, as I'm sure it'll require a lot of sorting out.
  • Add this as a function attribute. So something like int func() __attribute__(("arch", "-c")) or whatever the syntax usually looks like over there. We can also argue about this one later.

riscv-asm.md Outdated
at run-time. However we don't want to change the minimal execution environment
requirement to `rv64gcv`, since the `rv64gcv` version will be invoked only if
the execution environment supports the vector extension, so the minimal
execution environment requirement still is `rv64gc`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be commentary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is changed to commentary, it's important that the part about "won't raise the minimal execution environment requirement" be left as normative. We don't want to set EF_RISCV_RVC in the ELF flags when using .option arch, +c, even though binutils currently does that for .option rvc. See https://reviews.llvm.org/D122490#3419774 for a recent discussion about that.

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 29, 2021

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

Just as you wouldn't implement this by having a million ad-hoc -Zba options for .option arch, you wouldn't implement my proposal that way either, you'd parse (no)rv and then parse an extension name.

  • Add an = op (or just let it be blank). This would let users write something like .option arch rv64gc when they want to control exactly the arch they're based on.
  • Drop the extension list, and just make users insert multiple .option arch lines. This prevents us from getting screwed when our separator (IIUC a ,?) becomes a valid character in extensions.
  • Add this as command-line syntax. Something like -march=rv64g -march+c. We can deal with that later, though, as I'm sure it'll require a lot of sorting out.
  • Add this as a function attribute. So something like int func() __attribute__(("arch", "-c")) or whatever the syntax usually looks like over there. We can also argue about this one later.

That's __attribute__((target(...))) as an architecture-independent attribute (with architecture-dependent arguments), and __attribute__((target("-c"))) etc already work today for LLVM (in as much as it can without assembly syntax to toggle things on and off; for C in particular it won't work well, or at least not if you round-trip through assembly, might work fine if you don't, but for something like M or F it'll just (not) emit the instructions).

@kito-cheng
Copy link
Collaborator Author

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

Just as you wouldn't implement this by having a million ad-hoc -Zba options for .option arch, you wouldn't implement my proposal that way either, you'd parse (no)rv and then parse an extension name.

To me, both are extending stuffs, and now we have chance to implement one scheme more clear / better, so I would prefer the new one. (although that's subjective I know.)

@kito-cheng
Copy link
Collaborator Author

Add an = op (or just let it be blank). This would let users write something like .option arch rv64gc when they want to control exactly the arch they're based on.

Sound good idea.

Drop the extension list, and just make users insert multiple .option arch lines. This prevents us from getting screwed when our separator (IIUC a ,?) becomes a valid character in extensions.

Hmmm, make sense to me...

Add this as command-line syntax. Something like -march=rv64g -march+c. We can deal with that later, though, as I'm sure it'll require a lot of sorting out.

Yeah, I guess we need that to deal with the RISC-V profile...although I prefer use -march as the only option for control the ISA.

Add this as a function attribute. So something like int func() attribute(("arch", "-c")) or whatever the syntax usually looks like over there. We can also argue about this one later.

Like @jrtc27 said, LLVM having some level of support for that, write down that into to https://github.com/riscv/riscv-c-api-doc is my next step, but for GNU toolchain site, we need this to implement that.

@palmer-dabbelt
Copy link
Contributor

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

Just as you wouldn't implement this by having a million ad-hoc -Zba options for .option arch, you wouldn't implement my proposal that way either, you'd parse (no)rv and then parse an extension name.

IMO that's pretty ad-hoc: by removing the explicit name-spacing provided by Kito's proposal we end up with these options all at the top level, which makes them harder to describe. That said, I don't really care that much about syntax.

  • Add an = op (or just let it be blank). This would let users write something like .option arch rv64gc when they want to control exactly the arch they're based on.
  • Drop the extension list, and just make users insert multiple .option arch lines. This prevents us from getting screwed when our separator (IIUC a ,?) becomes a valid character in extensions.
  • Add this as command-line syntax. Something like -march=rv64g -march+c. We can deal with that later, though, as I'm sure it'll require a lot of sorting out.
  • Add this as a function attribute. So something like int func() __attribute__(("arch", "-c")) or whatever the syntax usually looks like over there. We can also argue about this one later.

That's __attribute__((target(...))) as an architecture-independent attribute (with architecture-dependent arguments), and __attribute__((target("-c"))) etc already work today for LLVM (in as much as it can without assembly syntax to toggle things on and off; for C in particular it won't work well, or at least not if you round-trip through assembly, might work fine if you don't, but for something like M or F it'll just (not) emit the instructions).

Sounds like that should be in some spec somewhere, as I don't remember having seen it before. I also very much question the value of providing options that generate broken code.

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 29, 2021

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

Just as you wouldn't implement this by having a million ad-hoc -Zba options for .option arch, you wouldn't implement my proposal that way either, you'd parse (no)rv and then parse an extension name.

IMO that's pretty ad-hoc: by removing the explicit name-spacing provided by Kito's proposal we end up with these options all at the top level, which makes them harder to describe. That said, I don't really care that much about syntax.

  • Add an = op (or just let it be blank). This would let users write something like .option arch rv64gc when they want to control exactly the arch they're based on.
  • Drop the extension list, and just make users insert multiple .option arch lines. This prevents us from getting screwed when our separator (IIUC a ,?) becomes a valid character in extensions.
  • Add this as command-line syntax. Something like -march=rv64g -march+c. We can deal with that later, though, as I'm sure it'll require a lot of sorting out.
  • Add this as a function attribute. So something like int func() __attribute__(("arch", "-c")) or whatever the syntax usually looks like over there. We can also argue about this one later.

That's __attribute__((target(...))) as an architecture-independent attribute (with architecture-dependent arguments), and __attribute__((target("-c"))) etc already work today for LLVM (in as much as it can without assembly syntax to toggle things on and off; for C in particular it won't work well, or at least not if you round-trip through assembly, might work fine if you don't, but for something like M or F it'll just (not) emit the instructions).

Sounds like that should be in some spec somewhere, as I don't remember having seen it before. I also very much question the value of providing options that generate broken code.

https://clang.llvm.org/docs/AttributeReference.html#target
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html (scroll way down)

It works just fine for everything but C. For C it'll still mess with the subtarget internally (which can affect codegen for relative costs of instructions as it looks at whether they're compressible) but we fail to omit the necessary .option (no)rvc to stop post-isel compression, and it seems that does affect using the integrated assembler too (not sure how though, bit strange). There's nothing wrong with the option, just a bug in its implementation for one specific case (that's a rather weird one).

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 29, 2021

IMO this is the right way to go: having .option arch +c mean the same thing as .option rvc is fine, it's just a more general way of doing things. It's way cleaner than trying to have a million ad-hoc '.option noZba` for every flavor of extension we want to flip on/off. I'd go a bit farther and suggest a handful of additions:

Just as you wouldn't implement this by having a million ad-hoc -Zba options for .option arch, you wouldn't implement my proposal that way either, you'd parse (no)rv and then parse an extension name.

IMO that's pretty ad-hoc: by removing the explicit name-spacing provided by Kito's proposal we end up with these options all at the top level, which makes them harder to describe. That said, I don't really care that much about syntax.

I agree that, if I could go back in time, I would stop the (no)rvc syntax from being adopted by GNU as and pick something more like this proposal. However, my personal opinion is that there should be a very good reason for introducing new syntax for something that can be done by generalising existing syntax, but .option arch is certainly better than .push_arch. I think the way forward is to bring it up with both GNU and LLVM developers to see if there's a consensus for which of the two current proposals they feel is best.

@jrtc27
Copy link
Contributor

jrtc27 commented Jul 29, 2021

NB: I would personally put a comma after .option arch, as is done for .attribute arch, if that is the proposal that ends up being adopted.

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Add , after arch.
  • Only allow enable/disable one arch one time.
  • Add = operator to set arch to specific configuration.

@asb
Copy link
Contributor

asb commented Aug 19, 2021

I agree that .option arch is just a more general way to specify things, and I prefer keeping things namespaced there.

We discussed somewhat on the LLVM sync-up call and there were diverging opinions on whether it was desirable to later deprecate .option [no]rvc (either just documenting it's no longer the preferred way, or even moving towards emitting warnings). But I don't think that's a discussion that needs to be had here and now.

@Nelson1225
Copy link

Nelson1225 commented Oct 25, 2021

Hi Guys, Is there any further concern about this issue? If no, then I think maybe it is time to merge this pr, and then we could proceed to the next issues, including add mapping symbols with ISA string to these .option arch directives. Thanks.

riscv-asm.md Outdated Show resolved Hide resolved
@jrtc27
Copy link
Contributor

jrtc27 commented Oct 25, 2021

Hi Guys, Is there any further concern about this issue? If no, then I think maybe it is time to merge this pr, and then we could proceed to the next issues, including add mapping symbols with ISA string to these .option arch directives. Thanks.

Mapping symbols are not dependent on this; .option (no)rvc already exists and could make use of them, even if not really necessary. As could files compiled with different -march= strings.

a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the disassembler support to parse mapping symbols
with ISA string.

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* testsuite/gas/riscv/option-arch-01a.d: Reflect the disassembler
	support of mapping symbols with ISA string.

opcodes/ChangeLog:

	* riscv-dis.c (initial_default_arch) Default architecture string if
	no ELF attributes are available.
	(default_arch): A copy of the default architecture string.
	(is_arch_mapping): New variable to keep track of whether the current
	architecture is deviced from a mapping symbol.
	(riscv_disassemble_insn): Update FPR names when a mapping symbol
	with ISA string is encountered.
	(riscv_get_map_state): Support mapping symbols with ISA string.
	Use `is_arch_mapping' to stop repeatedly parsing the default
	architecture.
	(riscv_get_disassembler): Safer architecture string handling.
	Copy the string to switch to the default while disassembling.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the assembler support to emit mapping symbols with
ISA string (and partial disassembler support only to pass tests).

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* config/tc-riscv.c (struct riscv_set_options): Add new field
	`arch_is_default' to keep track of whether the architecture is
	holding the default value.
	(updated_riscv_subsets) New variable to keep track of whether the
	architecture is possibly changed and inspected to emit proper
	mapping symbols.
	(make_mapping_symbol): Make mapping symbols with ISA string if
	necessary.  Don't emit the mapping symbol if the previous one in the
	same section has the same name.
	(riscv_elf_section_change_hook): New.  Try to emit a new mapping
	symbol if the section is changed.
	(riscv_mapping_state): Don't skip if the architecture is possibly
	changed and the new state is "code".
	(s_riscv_option): Keep track of `updated_riscv_subsets' and
	`riscv_opts.arch_is_default'.
	* config/tc-riscv.h (md_elf_section_change_hook): Define as
	`riscv_elf_section_change_hook'.
	(riscv_elf_section_change_hook): Declare.
	* testsuite/gas/riscv/mapping-01a.d: Reflect mapping symbols with
	ISA string.
	* testsuite/gas/riscv/mapping-02a.d: Likewise.
	* testsuite/gas/riscv/mapping-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-04a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-04a.d: Likewise.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_get_map_state): Minimum support of mapping
	symbols with ISA string without actually parsing the ISA string.
	The only purpose of this change is to pass the tests.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the disassembler support to parse mapping symbols
with ISA string.

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* testsuite/gas/riscv/option-arch-01a.d: Reflect the disassembler
	support of mapping symbols with ISA string.

opcodes/ChangeLog:

	* riscv-dis.c (initial_default_arch) Default architecture string if
	no ELF attributes are available.
	(default_arch): A copy of the default architecture string.
	(is_arch_mapping): New variable to keep track of whether the current
	architecture is deviced from a mapping symbol.
	(riscv_disassemble_insn): Update FPR names when a mapping symbol
	with ISA string is encountered.
	(riscv_get_map_state): Support mapping symbols with ISA string.
	Use `is_arch_mapping' to stop repeatedly parsing the default
	architecture.
	(riscv_get_disassembler): Safer architecture string handling.
	Copy the string to switch to the default while disassembling.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the assembler support to emit mapping symbols with
ISA string (and partial disassembler support only to pass tests).

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* config/tc-riscv.c (struct riscv_set_options): Add new field
	`arch_is_default' to keep track of whether the architecture is
	holding the default value.
	(updated_riscv_subsets) New variable to keep track of whether the
	architecture is possibly changed and inspected to emit proper
	mapping symbols.
	(make_mapping_symbol): Make mapping symbols with ISA string if
	necessary.  Don't emit the mapping symbol if the previous one in the
	same section has the same name.
	(riscv_elf_section_change_hook): New.  Try to emit a new mapping
	symbol if the section is changed.
	(riscv_mapping_state): Don't skip if the architecture is possibly
	changed and the new state is "code".
	(s_riscv_option): Keep track of `updated_riscv_subsets' and
	`riscv_opts.arch_is_default'.
	* config/tc-riscv.h (md_elf_section_change_hook): Define as
	`riscv_elf_section_change_hook'.
	(riscv_elf_section_change_hook): Declare.
	* testsuite/gas/riscv/mapping-01a.d: Reflect mapping symbols with
	ISA string.
	* testsuite/gas/riscv/mapping-02a.d: Likewise.
	* testsuite/gas/riscv/mapping-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-04a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-04a.d: Likewise.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_get_map_state): Minimum support of mapping
	symbols with ISA string without actually parsing the ISA string.
	The only purpose of this change is to pass the tests.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the disassembler support to parse mapping symbols
with ISA string.

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* testsuite/gas/riscv/option-arch-01a.d: Reflect the disassembler
	support of mapping symbols with ISA string.

opcodes/ChangeLog:

	* riscv-dis.c (initial_default_arch) Default architecture string if
	no ELF attributes are available.
	(default_arch): A copy of the default architecture string.
	(is_arch_mapping): New variable to keep track of whether the current
	architecture is deviced from a mapping symbol.
	(riscv_disassemble_insn): Update FPR names when a mapping symbol
	with ISA string is encountered.
	(riscv_get_map_state): Support mapping symbols with ISA string.
	Use `is_arch_mapping' to stop repeatedly parsing the default
	architecture.
	(riscv_get_disassembler): Safer architecture string handling.
	Copy the string to switch to the default while disassembling.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the assembler support to emit mapping symbols with
ISA string (and partial disassembler support only to pass tests).

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* config/tc-riscv.c (struct riscv_set_options): Add new field
	`arch_is_default' to keep track of whether the architecture is
	holding the default value.
	(updated_riscv_subsets) New variable to keep track of whether the
	architecture is possibly changed and inspected to emit proper
	mapping symbols.
	(make_mapping_symbol): Make mapping symbols with ISA string if
	necessary.  Don't emit the mapping symbol if the previous one in the
	same section has the same name.
	(riscv_elf_section_change_hook): New.  Try to emit a new mapping
	symbol if the section is changed.
	(riscv_mapping_state): Don't skip if the architecture is possibly
	changed and the new state is "code".
	(s_riscv_option): Keep track of `updated_riscv_subsets' and
	`riscv_opts.arch_is_default'.
	* config/tc-riscv.h (md_elf_section_change_hook): Define as
	`riscv_elf_section_change_hook'.
	(riscv_elf_section_change_hook): Declare.
	* testsuite/gas/riscv/mapping-01a.d: Reflect mapping symbols with
	ISA string.
	* testsuite/gas/riscv/mapping-02a.d: Likewise.
	* testsuite/gas/riscv/mapping-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-04a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-03a.d: Likewise.
	* testsuite/gas/riscv/mapping-norelax-04a.d: Likewise.

opcodes/ChangeLog:

	* riscv-dis.c (riscv_get_map_state): Minimum support of mapping
	symbols with ISA string without actually parsing the ISA string.
	The only purpose of this change is to pass the tests.
a4lg added a commit to a4lg/binutils-gdb that referenced this pull request Aug 11, 2022
The mapping symbols with ISA string is proposed to deal with so called
"ifunc issue".  It enables disassembling a certain range of the code with
a different architecture than the rest, even if conflicting.  This is useful
when there's "optimized" implementation is available but dynamically
switched only if a certain extension is available.

This commit implements the disassembler support to parse mapping symbols
with ISA string.

[1] Proposal: Extend .option directive for control enabled extensions on
specific code region,
riscv-non-isa/riscv-asm-manual#67

[2] Proposal: Add mapping symbol,
riscv-non-isa/riscv-elf-psabi-doc#196

This commit is based on Nelson Chu's proposal "RISC-V: Output mapping
symbols with ISA string once .option arch is used." but heavily modified to
reflect the intent of Kito's original proposal.  It is also made smarter so
that it no longer requires MAP_INSN_ARCH.

gas/ChangeLog:

	* testsuite/gas/riscv/option-arch-01a.d: Reflect the disassembler
	support of mapping symbols with ISA string.

opcodes/ChangeLog:

	* riscv-dis.c (initial_default_arch) Default architecture string if
	no ELF attributes are available.
	(default_arch): A copy of the default architecture string.
	(is_arch_mapping): New variable to keep track of whether the current
	architecture is deviced from a mapping symbol.
	(riscv_disassemble_insn): Update FPR names when a mapping symbol
	with ISA string is encountered.
	(riscv_get_map_state): Support mapping symbols with ISA string.
	Use `is_arch_mapping' to stop repeatedly parsing the default
	architecture.
	(riscv_get_disassembler): Safer architecture string handling.
	Copy the string to switch to the default while disassembling.
@kito-cheng
Copy link
Collaborator Author

We've binutils support on upstream and a patch on LLVM (https://reviews.llvm.org/D123515), so I think it's time to merge this, before merge this, I would like wait one more week to make sure no objection or further comment from LLVM community @asb @luismarques @jrtc27 @MaskRay.

riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
VERSION := [0-9]+ 'p' [0-9]+
| [1-9][0-9]*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct. Can you please walk me through it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the current rule might be confused by [, ], so I try to rewrite the rule to allow VERSION as empty string, also expand first rule into two for forbid 0p0.

EXTENSION              := <OP> <EXTENSION-NAME> [<VERSION>]

VERSION                := [0-9]+ 'p' [0-9]+
                        | [1-9][0-9]*

to

EXTENSION              := <OP> <EXTENSION-NAME> <VERSION>

VERSION                := [1-9][0-9]* 'p' [0-9]+
                        | [0-9]+ 'p' [1-9][0-9]*
                        | [1-9][0-9]*
                        |

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luismarques let me know it's still unclear to you or any case you think it's still ambiguous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

riscv-asm.md Outdated Show resolved Hide resolved
riscv-asm.md Outdated Show resolved Hide resolved
Comment on lines +347 to +349
NOTE: `.option arch, +` will also enable all required extensions, for example,
`rv32i` + `.option arch, +v` will also enable `f`, `d`, `zve32x`, `zve32f`,
`zve64x`, `zve64f`, `zve64d`, `zvl32b`, `zvl64b` and `zvl128b` extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explain somewhere that e.g. +c, -c and +v, -v do the expected things. I guess the expected thing is that the named extension is disabled but its dependencies are enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(alternatively it could be an error)

Copy link
Contributor

@luismarques luismarques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all this effort Kito!

riscv-asm.md Outdated Show resolved Hide resolved
…ode region

This commit extend .option diretive with new keyword:

- .option arch <ext-list>

Enable and/or disable specific ISA extensions for following code regions, but
without changing the arch attribute, that means it won't raise the minimal
execution environment requirement, so the user should take care to protect
execution of the code regions around `.option push`/`.option arch`/`.option pop`.
A typical use case is is with `ifunc`, e.g. the libc is built with `rv64gc`,
but a few functions like memcpy provide two versions, one built with `rv64gc`
and one built with `rv64gcv`, and then select between them by ifunc mechanism
at run-time.  However we don't want to change the minimal execution environment
requirement to `rv64gcv`, since the `rv64gcv` version will be invoked only if
the execution environment supports the vector extension, so the minimal
execution environment requirement still is `rv64gc`.

Example:
```assembly
.attribute arch, rv64imafdc
 # You can only use instruction from i, m, a, f, d and c extensions.
memcpy_general:
    add     a5,a1,a2
    beq     a1,a5,.L2
    add     a2,a0,a2
    mv      a5,a0
.L3:
    addi    a1,a1,1
    addi    a5,a5,1
    lbu     a4,-1(a1)
    sb      a4,-1(a5)
    bne     a5,a2,.L3
.L2:
    ret

.option push    # Push current option to stack.
.option arch +v # Enable vector extension, we can use any instruction in imafdcv extension.
memcpy_vec:
    mv a3, a0
.Lloop:
    vsetvli t0, a2, e8, m8, ta, ma
    vle8.v v0, (a1)
    add a1, a1, t0
    sub a2, a2, t0
    vse8.v v0, (a3)
    add a3, a3, t0
    bnez a2, .Lloop
    ret
.option pop   # Pop current option from stack, restore the enabled ISA extension status to imafdc.
```
@kito-cheng
Copy link
Collaborator Author

Thanks everyone! We have the last necessary element of the function multi-versioning now!

@kito-cheng kito-cheng merged commit 18b3a3c into riscv-non-isa:master Mar 14, 2023
@kito-cheng kito-cheng deleted the push-pop-arch branch March 14, 2023 15:11
luxufan pushed a commit to llvm/llvm-project that referenced this pull request May 26, 2023
The proposal of '.option arch' directive is riscv-non-isa/riscv-asm-manual#67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515
cuviper pushed a commit to rust-lang/llvm-project that referenced this pull request Jun 8, 2023
The proposal of '.option arch' directive is riscv-non-isa/riscv-asm-manual#67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515
mgehre-amd added a commit to Xilinx/llvm-project that referenced this pull request Jun 14, 2023
* [ELF][test] Add -NEXT and -NOT after D150644 (–-print-memory-usage)

* [Sema] cast to CXXRecordDecl correctly when diag a default comparison method

Fixed: https://github.com/llvm/llvm-project/issues/62791
Fixed: https://github.com/llvm/llvm-project/issues/62102
in c++20, default comparison is supported. `getLexicalDeclContext` maybe cannot
get the `CXXRecord` if default comparison defined out of `CXXRecord`.
This patch want to get these information from the first function argument.

Reviewed By: #clang-language-wg, erichkeane

Differential Revision: https://reviews.llvm.org/D151365

* [mlir][gpu] Add a pattern for transforming gpu.global_id to thread + blockId * blockDim

This patch implements a rewrite pattern for transforming gpu.global_id x
to gpu.thread_id + gpu.block_id * gpu.block_dim.

Reviewed By: makslevental

Differential Revision: https://reviews.llvm.org/D148978

* Don't disable loop unroll for vectorized loops on AMDGPU target

We've got a performance regression after the https://reviews.llvm.org/D115261.
Despite the loop being vectorized unroll is still required.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D149281

* Re-revert "[lldb] Move PassthroughScriptedProcess to `lldb.scripted_process` module"

This reverts commit 429e74839506ea8ba962d24647264ed81f680bbf since it
didn't address the test failures on GreenDragon.

This patch will mark the tests as expected to fail until I can reproduce
the issue and find a solution.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

* [libc] Fix compilation issues in memory_check_utils.h

Strict warnings require explicit static_cast to counteract
default widening of types narrower than int.

Functions in header files should have vague linkage (inline
keyword), not internal linkage (static) or external linkage
(no inline keyword) even for template functions.  Note these
don't use the LIBC_INLINE macro since this is only for test code.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151494

* [libc] Support LIBC_COPT_USE_C_ASSERT build flag

In this mode, LIBC_ASSERT is just standard C assert.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151498

* [libc++] Add support for generated tests in the libc++ test format

A recurring problem recently has been that libc++ has several generated
tests which all need to be re-generated before committing a change. This
creates noise during code reviews and friction for contributors.

Furthermore, the way we generated most of these tests resulted in
extremely bad compilation times when using modules, because we defined
a macro before compiling each file.

This commit introduces a new kind of test called a '.gen' test. These
tests are normal shell tests, however the Lit test format will run the
test to discover the actual Lit tests it should run. This basically
allows generating a Lit test suite on the fly using arbitrary code,
which can be used in the future to generate tests like our __verbose_abort
tests and several others.

Differential Revision: https://reviews.llvm.org/D151258

* [libcxxabi] link abort_message into unittest_demangle

unittest_demangle.pass.cpp uses the preprocessor to #include
cxa_demangle.cpp. D148566 will make more use of std::string_view in
libcxxabi rather than the home-grown StringView, but as a result of
D149092, a definition of abort_message needs to be provided.

Otherwise builds of check-cxxabi with -DLLVM_ENABLE_ASSERTIONS=ON will
fail to link with the errors:
/usr/bin/ld: /tmp/lit-tmp-0akcq37p/cc6DLdvw.o: in function `(anonymous namespace)::itanium_demangle::starts_with(std::__1::basic_string_view<char, std::__1::char_traits<char> >, char)':
unittest_demangle.pass.cpp:(.text+0x81): undefined reference to `abort_message'
/usr/bin/ld: /tmp/lit-tmp-0akcq37p/cc6DLdvw.o: in function `(anonymous namespace)::itanium_demangle::starts_with(std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::basic_string_view<char, std::__1::char_traits<char> >)':
unittest_demangle.pass.cpp:(.text+0x2aa): undefined reference to `abort_message'
/usr/bin/ld: unittest_demangle.pass.cpp:(.text+0x312): undefined reference to `abort_message'
/usr/bin/ld: /tmp/lit-tmp-0akcq37p/cc6DLdvw.o: in function `(anonymous namespace)::itanium_demangle::OutputBuffer::writeUnsigned(unsigned long, bool)':
unittest_demangle.pass.cpp:(.text+0x54f): undefined reference to `abort_message'
/usr/bin/ld: unittest_demangle.pass.cpp:(.text+0x5b7): undefined reference to `abort_message'
/usr/bin/ld: /tmp/lit-tmp-0akcq37p/cc6DLdvw.o:unittest_demangle.pass.cpp:(.text+0xe6e): more undefined references to `abort_message' follow
/usr/bin/ld: /home/libcxx-builder/.buildkite-agent/builds/google-libcxx-builder-f0560ea595b1-1/llvm-project/libcxx-ci/build/generic-gcc/test/Output/unittest_demangle.pass.cpp.dir/t.tmp.exe: hidden symbol `abort_message' isn't defined

Use the preprocessor further to provide the definition of abort_message
for this unittest.

Reviewed By: #libc_abi, phosek

Differential Revision: https://reviews.llvm.org/D151160

* [MLIR] Fixup Bazel build for Add a pattern for transforming gpu.global_id to thread + blockId * blockDim

This patch updates the Bazel build to catch up with changes in https://reviews.llvm.org/D148978.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151496

* [Demangle] avoid more std::string_view::substr

In D148959, I removed usage of std::string_view::substr because it may
throw, and libcxxabi cannot use such code.  I missed one instance in
llvm::starts_with.  That is blocking copying the code back upstream in
D148566.

Mark these helpers noexcept (as they are in C++20) as well, to remind
future travelers.

Make these changes upstream, and copy them back downstream using
libcxxabi/src/demangle/cp-to-llvm.sh.

Reviewed By: #libc_abi, MaskRay, ldionne

Differential Revision: https://reviews.llvm.org/D151260

* [mlir][gpu] Add i64 & f64 support to gpu.shuffle

This patch adds support for i64, f64 values in `gpu.shuffle`, rewriting 64bit shuffles into two 32bit shuffles.
The reason behind this change is that both CUDA & HIP support this kind of shuffling.
The implementation provided by this patch is based on the LLVM IR emitted by clang for 64bit shuffles when using `-O3`.

Reviewed By: makslevental

Differential Revision: https://reviews.llvm.org/D148974

* [lldb] Disable variable watchpoints when going out of scope

If we use a variable watchpoint with a condition using a scope variable,
if we go out-of-scope, the watpoint remains active which can the
expression evaluator to fail to parse the watchpoint condition (because
of the missing varible bindings).

This was discovered after `watchpoint_callback.test` started failing on
the green dragon bot.

This patch should address that issue by setting an internal breakpoint
on the return addresss of the current frame when creating a variable
watchpoint. The breakpoint has a callback that will disable the watchpoint
if the the breakpoint execution context matches the watchpoint execution
context.

This is only enabled for local variables.

This patch also re-enables the failing test following e1086384e584.

rdar://109574319

Differential Revision: https://reviews.llvm.org/D151366

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

* Add a `-verify-roundtrip` option to `mlir-opt` intended to validate custom printer/parser completeness

Running:

  MLIR_OPT_CHECK_IR_ROUNDTRIP=1 ninja check-mlir

will now exercises all of our test with a round-trip to bytecode and a comparison for equality.

Reviewed By: rriddle, ftynse, jpienaar

Differential Revision: https://reviews.llvm.org/D90088

* [MLIR] Add native Bytecode support for properties

This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.

The scheme relies on a new section where properties are stored in sequence

  { size, serialize_properties }, ...

The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.

Back-deployment to version prior to 4 are relying on getAttrDictionnary() which
we intend to deprecate and remove: that is putting a de-factor end-of-support
horizon for supporting deployments to version older than 4.

Differential Revision: https://reviews.llvm.org/D151065

* [mlir][sparse][gpu] end to end test for matmul

(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D151507

* [libc][doc] Update math function status page to show more targets.

Show availability of math functions on each target.

Reviewed By: jeffbailey

Differential Revision: https://reviews.llvm.org/D151489

* [libc][darwin] Add OSUtil for darwin arm64 target so that unit tests can be run.

Currently unit tests cannot be run on macOS due to missing OSUtil.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151377

* [tosa] Improve inferred shapes of TOSA operations

The TosaInferShapes pass avoids updating the shapes of tensor operators
when the consumers are not TOSA operations, limiting the efficacy of
TosaInferShapes when the IR is a mix of TOSA and other operations.
This change attempts to update the result shapes when the consumers
themselves have reasonable type/shape inference methods.

Reviewed By: eric-k256

Differential Revision: https://reviews.llvm.org/D151228

* [fuzzer] Don't hard-code page size in FuzzerUtil.h

Don't hard code the page in FuzzerUtil.h, this breaks on
e.g. LoongArch which defaults to a 16KiB page size.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D140607

* [Clang][Attribute] Improve the AST/diagnoses fidelity of alignas and _Alignas

- Fix diagnoses when the argument to `alignas` or `_Alignas` is an incomplete type.

Before:
```
./alignas.cpp:1:15: error: invalid application of 'alignof' to an incomplete type 'void'
class alignas(void) Foo {};
             ~^~~~~
1 error generated.
```
Now:
```
./alignas.cpp:1:15: error: invalid application of 'alignas' to an incomplete type 'void'
class alignas(void) Foo {};
             ~^~~~~
1 error generated.
```

- Improve the AST fidelity of `alignas` and `_Alignas` attribute.

Before:
```
AlignedAttr 0x13f07f278 <col:7> alignas
    `-ConstantExpr 0x13f07f258 <col:15, col:21> 'unsigned long'
      |-value: Int 8
      `-UnaryExprOrTypeTraitExpr 0x13f07f118 <col:15, col:21> 'unsigned long' alignof 'void *'
```

Now:
```
AlignedAttr 0x14288c608 <col:7> alignas 'void *'
```

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D150528

* [mlir][tosa] Add type checking traits to the appropriate ops

Add the trait `SameOperandsAndResultElementType` and
`SameOperandsElementType` to verify ops that are known
to have the same input and output type rather than generate
an invalid tosa IR with mixed data types like:

  "tosa.add"(%0, %1) : (tensor<nxbf16>, tensor<nxf32>) -> tensor<nxf32>

Thus apply tosa.cast prior if needed.

Change-Id: Ie866b84e371e3b571ec04f7abb090c216dd39c33

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D150472

* [libc] Enable hermetic floating point tests

This patch enables us to run the floating point tests as hermetic.
Importantly we now use the internal versions of the `fesetround` and
`fegetround` functions.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151123

* Revert "[libc] Enable hermetic floating point tests"

This passed locally but unfortauntely it seems some tests are not ready
to be made hermetic. Revert for now until we can investigate
specifically which tests are failing and mark those as `UNIT_TEST_ONLY`.

This reverts commit 417ea79e792a87d53f5ac4f5388af4b25aa04d7d.

* [Clang][OpenMP] Fix the issue that list items in `has_device_addr` are still mapped to the target device

This patch fixes the issue that list items in `has_device_addr` are still mapped
to the target device because front end emits map type `OMP_MAP_TO`.

Fix #59160.

Reviewed By: jyu2

Differential Revision: https://reviews.llvm.org/D141627

* [sanitizer] Implement __sanitizer_get_allocated_size_fast

The primary motivation for this change is to allow FreeHooks to obtain
the allocated size of the pointer being freed in a fast, efficient manner.

Differential Revision: https://reviews.llvm.org/D151360

* [Clang] Fix test case issue introduced by D141627

* llvm-symbolizer: access the base address from the skeleton CU, not the split unit

In Split DWARF, if the unit had a non-trivial base address (a real
low_pc, rather than one with fixed value 0) then computing addresses
needs to access that base address to add to any base address-relative
values. But the code was trying to access the base address in the split
unit, when it's actually in the skeleton unit. So delegate to the
skeleton if it's available.

Fixes #62941

* Revert "[fuzzer] Don't hard-code page size in FuzzerUtil.h"

This reverts commit a2b677e8153758997a9043360cf51333eecc3c44.

reverting

Differential Revision: https://reviews.llvm.org/D140607

because <sys/auxv.h> and getauxval() are not available on macOS;
this change is breaking the mac CI bots.

* [mlir] Fix non-const lvalue reference to type 'uint64_t' cannot bind to type 'size_t' error (NFC)

/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1007:39: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
    if (failed(propReader.parseVarInt(count)))
                                      ^~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
  LogicalResult parseVarInt(uint64_t &result) {
                                      ^
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1033:41: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
    if (failed(dialectReader.readVarInt(propertiesIdx)))
                                        ^~~~~~~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:926:38: note: passing argument to parameter 'result' here
  LogicalResult readVarInt(uint64_t &result) override {
                                     ^
2 errors generated.

/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1033:41: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
    if (failed(dialectReader.readVarInt(propertiesIdx)))
                                        ^~~~~~~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:926:38: note: passing argument to parameter 'result' here
  LogicalResult readVarInt(uint64_t &result) override {
                                     ^
1 error generated.

* Fix test by marking it x86 specific

* [Clang] Simplify test `clang/test/OpenMP/bug59160.c`

* TestStackCoreScriptedProcess.py is timing out, skip it

The x86_64 macOS CI bot is failing because this test
times out.  It was marked as expectedFail earlier today,
but that's not considered a fail so the CI runs are
red.  Skipping it on Darwin for now until Ismail can
look into it.

* [NFC][CLANG] Fix static code analyzer concerns

Reported by Static Code Analyzer Tool:

Inside "CGExprConstant.cpp" file, VisitObjCEncodeExpr() returns null value which is dereferenced without checking.

This patch adds an assert.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D151280

* [fuzzer] Don't hard-code page size in FuzzerUtil.h

Don't hard code the page in FuzzerUtil.h, this breaks on
e.g. LoongArch which defaults to a 16KiB page size.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D140607

* Fix MLIR back-deployment to version < 5 ; properties section should not be emitted.

This was an oversight in the development of bytecode version 5, which was
caught by downstream StableHLO compatibility tests.

Differential revision: https://reviews.llvm.org/D151531

* Fix MLIR Bytecode backward deployment

The condition for guarding the properties section was reversed.

* Bump the MLIR bytecode current revision (version 5) to match the implementation

* [fuzzer] Platfom specific version of PageSize

* Revert "[MLIR] Add native Bytecode support for properties"

This reverts commit ca5a12fd69d4acf70c08f797cbffd714dd548348
and follow-up fixes:

df34c288c428eb4b867c8075def48b3d1727d60b
07dc906883af660780cf6d0cc1044f7e74dab83e
ab80ad0095083fda062c23ac90df84c40b4332c8
837d1ce0dc8eec5b17255291b3462e6296cb369b

The first commit was incomplete and broken, I'll prepare a new version
later, in the meantime pull this work out of tree.

* [ELF] findAllByVersion: optimize a find('@') with hasVersionSuffix. NFC

* Fix link to the TOSA spec in the dialect doc (NFC)

* [X86] Add test for select folding.

When avx512 is available the lhs operand of select instruction can be
folded with mask instruction, while the rhs operand can't.

* [libc++] Fix C++26 transitive includes list

Reviewed By: vitalybuka

Spies: vitalybuka, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151508

* [flang] Fix an unused variable warning

This patch fixes:

  flang/lib/Optimizer/HLFIR/Transforms/LowerHLFIROrderedAssignments.cpp:911:10:
  error: unused variable 'inserted' [-Werror,-Wunused-variable]

* [mlir] Add CastInfo for mlir classes subclassing from PointerUnion

This is required to use the function variants of cast/isa/dyn_cast/etc
on them.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

* [mlir] Move tblgen code generation to use functional forms of cast/isa

Summary:
The method forms are deprecated. This updates the rest of the tblgen
uses of methods where a function call is available.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Reviewers: rriddle

* [mlir] Update cast/isa method calls to function calls

This updates the rest (at implementation) of MLIR's use of cast/isa
method calls where function calls are possible and automatic refactoring
is not. These changes occured in .td files or in macros.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

* [clang-tody] Fix typos in documentation

* [clang] Modernize SourceLocation (NFC)

* [NFC][Py Reformat] Reformat python files in mlir subdir

This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Differential Revision: https://reviews.llvm.org/D150782

* [NFC] Add mlir python reformat SHA to .git-blame-ignore-revs

* [NFC][Py Reformat] Reformat version-check.py in .github dir

* [AMDGPUCodegenPrepare] Add NewPM Support

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D151241

* [flang][hlfir] Only canonicalize forall_index if it can be erased

It seems the canonicalization was not correct: it cannot return that
it failed if it did modify the IR.
This was exposed by a new MLIR sanity check added in
https://reviews.llvm.org/D144552.
I am not sure it is legit to return success if the operation being
canonicalized is not modified either. So only remove the loads if
they are the only uses of the forall_index.

Should fix (intermittent?) bot failures like
https://lab.llvm.org/buildbot/#/builders/179/builds/6251
since the new MLIR check was added.

Differential Revision: https://reviews.llvm.org/D151487

* [libc] Make ErrnoSetterMatcher handle logging floating point values.

Along the way, couple of additional things have been done:

1. Move `ErrnoSetterMatcher.h` to `test/UnitTest` as all other matchers live
   there now.
2. `ErrnoSetterMatcher` ignores matching `errno` on GPUs.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151129

* [clang] Remove unused declaration IgnoreLinkageSpecDecls

The declaration and its corresponding function definition were removed
once by:

  commit 02093906fa0fd5bacc61b2189ea643c78cd02509
  Author: Nathan Sidwell <nathan@acm.org>
  Date:   Mon Feb 14 10:19:04 2022 -0800

However, the declaration was added back without a corresponding
function definition a few days later by:

  commit 18ead23385a4e0e6421d658591b1ee6a1c592b53
  Author: Peter Collingbourne <peter@pcc.me.uk>
  Date:   Thu Feb 17 11:23:33 2022 -0800

This is most likely a rebasing error.

* [lit][NFC] Fix a couple of typos

* [lit][NFC] Remove double space after full stop/period

* [lit][NFC] Remove docs for nonexistent parameter

* [RISCV] Regenerate missing test checks

Codegen was different between RV32 and RV64 so the single unified CHECK
was skipping these functions.

* [CodeGen] Remove unused member variable NextBlockInfo

The last use was removed by:

  commit c9a52de0026093327daedda7ea2eead8b64657b4
  Author: Akira Hatanaka <ahatanaka@apple.com>
  Date:   Wed Jun 3 16:41:50 2020 -0700

* [RISCV] Custom lower vector llvm.is.fpclass to vfclass.v

After D149063.
This patch adds support for both scalable and fixed-length vector.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151176

* [CodeGen] Remove unused declaration EmitMoveFromReturnSlot

The corresponding function definition was removed by:

  commit 56e5a2e13e3048fc2ff39029cde406d9f4eb55f3
  Author: George Burgess IV <george.burgess.iv@gmail.com>
  Date:   Sat Mar 10 01:11:17 2018 +0000

* [CSKY] Add missing relocation type for FK_Data_4 Fixup

* [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.

The main purpose of this is to simplify register pressure tracking as after the pass there is no need
to track subreg liveness anymore.

On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs
becomes ordinary registers.

Intersting sideeffect: spill-vgpr.ll has lost a lot of spills.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D139732

* [AMDGPU] 4-align SGPR triples

Previously SGPR triples like s[3:5] were aligned on a 3-SGPR boundary
which has no basis in hardware.

Aligning them on a 4-SGPR boundary is at least justified by the
architecture reference guide which says: "Quad-alignment of SGPRs is
required for operation on more than 64-bits".

Currently there are no instructions that take SGPR triples as operands
so the issue is latent.

Differential Revision: https://reviews.llvm.org/D151463

* [Clang][RISCV] Add description for test case . NFC

* [clangd] Implement configs to stop clangd produce a certain semantic tokens

This patch introduces the following configurations to .clangd:

```
SemanticTokens:
    DisabledKinds: [ ... ]
    DisabledModifiers: [ ... ]
```

Based on the config, clangd would stop producing a certain type of semantic tokens from the source file.

Fixes https://github.com/clangd/clangd/discussions/1598

Reviewed By: nridge

Differential Revision: https://reviews.llvm.org/D148489

* [ASAN] Support memory checks on vp.gather/scatter.

The patch supports vp.gather/scatter by allowing addresses being pointer vectors.
And then we just need to check each active pointer element of those pointer vectors.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D149245

* [AMDGPU] Silence gcc warning [NFC]

Without the fix gcc complains with
 ../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
  1542 |     unsigned CopyOp = MI->getOperand(1).isReg()
       |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1543 |                           ? AMDGPU::COPY
       |                           ~~~~~~~~~~~~~~
  1544 |                           : TII->getMovOpcode(TRI->getRegClassForOperandReg(
       |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1545 |                                 *MRI, MI->getOperand(0)));
       |

* [Clang][RISCV] Add test coverage for typedef of RVV intrinsic data types under riscv_vector.h. NFC

Signed-off by: eop Chen <eop.chen@sifive.com>

* [mlir] Move casting calls from methods to function calls

The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D151542

* [RISCV] Don't scalarize vector stores if volatile

As noted by @reames in https://reviews.llvm.org/D151211#4373404, we shouldn't
scalarize vector stores of constants if the store is volatile, or vector copies
if either the store or load are volatile.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D151500

* [clangd] Remove inline Specifier for DefineOutline Tweak

`inline` specifiers should be removed from from the function declaration and
the newly-created implementation.

For example, take the following (working) code:
```cpp
// foo.hpp
struct A {
  inline void foo() { std::cout << "hello world\n" << std::flush; }
};

// foo.cpp
#include "foo.hpp"

// main.cpp
#include "foo.hpp"

int main() {
  A a;
  a.foo();
  return 0;
}

// compile: clang++ -std=c++20 main.cpp foo.cpp -o main
```

After applying the tweak:
```
// foo.hpp
struct A {
  inline void foo();
};

// foo.cpp
#include "foo.hpp"

inline void A::foo() { std::cout << "hello world\n" << std::flush; }

// main.cpp
#include "foo.hpp"

int main() {
  A a;
  a.foo();
  return 0;
}

// compile: clang++ -std=c++20 main.cpp foo.cpp -o main
```

We get a link error, as expected:
```
/usr/bin/ld: /tmp/main-4c5d99.o: in function `main':
main.cpp:(.text+0x14): undefined reference to `A::foo()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```

This revision removes these specifiers from both the header and the source file. This was identified in Github issue llvm/llvm-project#61295.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D151294

* [clang-format][doc] Fix contradiction in SortIncludes description

Fixes #62033.

Differential Revision: https://reviews.llvm.org/D147894

* [clang-format][doc] Fix a typo introduced in 9aab0db13fb6d

* [mlir][tensor] Fix one-shot bufferization of tensor.reshape.

I believe that the previous implementation did not work on any input. It
called getMemRefType with `layout = {}`, presumably with the intention
to create a MemrefType with identity layout. However, the implementation
of that function returns a MemrefType with *unknown* layout if it is
provided with a default-constructed layout attribute. This patch uses
getMemRefTypeWithStaticIdentityLayout instead, with has identical
behavior except for the case of a default-constructed layout, which it
passes on as-is to the MemrefType.

This problem did not surface in the test because tensor.reshape was not
tested with -one-shot-bufferize. This patch introduces a test copied
from the tests for -tesnor-bufferize adapted in as follows: since the
test is run with "bufferize-function-boundaries", a tensor that is
passed into the function is bufferized into a memref with unknown
layout, which wouldn't be a valid intput for memref.reshape, so the
tests now uses a tensor constructed with arith.constant inside of the
function.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D151544

* [Remarks] Retain all remarks by default, add option to drop without DL.

At the moment, dsymutil drops all remarks without debug location.

There are many cases where debug location may be missing for remarks,
mostly due LLVM not preserving debug locations. When using bitstream
remarks for statistical analysis, those missed remarks mean we get an
incomplete picture.

The patch flips the default to keeping all remarks and leaving it to
tools that display remarks to filter out remarks without debug locations
as needed.

The new --remarks-drop-without-debug flag can be used to drop remarks
without debug locations, i.e. restore the previous behavior.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D151089

* [bazel] Run buildifier on libc BUILD. NFC.

* [bazel][libc] Adjust for 4f1fe19df385445fabde47998affca50c7f1bc1e

This also required a build rule for error_to_string, so add that too.

* [bazel][libc] Add file missing for 25174976e19b2ef916bb94f4613662646c95cd46

* [Utils] Added the ability to print the pass number and IR after it is triggered

As part of this patch, 2 options have been added:
print-pass-numbers and print-after-pass-number.

1) The print-pass-numbers option allows to print the pass names and their ordinals.
   The output of the option looks like this:
                Running pass ORDINAL PASS_NAME

2) The print-after-pass-number option allows to print IR after pass with the number
   which reported by print-passes-names.

Reviewed By: apilipenko, aeubanks

Differential Revision: https://reviews.llvm.org/D149345

* [RISCV] Support '.option arch' directive

The proposal of '.option arch' directive is https://github.com/riscv-non-isa/riscv-asm-manual/pull/67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515

* [IRTranslator] Implement translation of entry_value dbg.value intrinsics

For dbg.value intrinsics targeting an llvm::Argument address whose expression
starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein
physical register corresponding to that Argument.

Depends on D151328

Differential Revision: https://reviews.llvm.org/D151329

* [SelectionDAGBuilder] Handle entry_value dbg.value intrinsics

Summary:
DbgValue intrinsics whose expression is an entry_value and whose address is
described an llvm::Argument must be lowered to the corresponding livein physical
register for that Argument.

Depends on D151329

Reviewers: aprantl

Subscribers:

* [FastISel][NFC] Refactor if/else chain into early returns

This will make it easier to add more cases in a subsequent commit and also
better conforms to the coding guidelines.

Depends on D151330

Differential Revision: https://reviews.llvm.org/D151331

* [FastISel][NFC] Remove repeated calls to get{Variable,Expr}

This will make it easy to reuse these values in subsequent commits.

Depends on D151331

Differential Revision: https://reviews.llvm.org/D151332

* [bazel][libc] Add another missing dependency

* [KnownBits] Add fast-path for shl with unknown shift amount (NFC)

We currently don't call into KnownBits::shl() from ValueTracking
if the shift amount is unknown. If we do try to do so, we get
significant compile-time regressions, because evaluating all 64
shift amounts if quite expensive, and mostly pointless in this case.
Add a fast-path for the case where the shift amount is the full
[0, BitWidth-1] range. This primarily requires a more accurate
estimate of the max shift amount, to avoid taking the fast-path in
too many cases.

Differential Revision: https://reviews.llvm.org/D151540

* [mlir] Make sure mlir-opt is in the list of substituted tools

otherwise it gets picked up from $PATH, which is not always working
properly.

* [KnownBits] Partially synchronize shift implementations (NFC)

And remove some bits of effectively dead code.

* [InstCombine] Use KnownBits::shl() in SimplifyDemandedBits()

It is more precise than the custom logic we had. This came up when
trying to enforce a consistency assertion with computeKnownBits().

* [InstCombine] Add test for missing assume handling multi-use demanded bits (NFC)

Works if the assume is on a root value or an operation that does
not support multi-use demanded bits.

* [NFC] refactor code

Split the NFC patch from D151535. Refactor canCombineAsMaskOperation to
take 1 input operand.

* [LLDB] Explicitly declare constructor in `PersistentExpressionState`

It seems that when trying to link the lldb library explicitly, the inlined default constructor cannot find the vtable for the class. This patch fixes this by explicitly declaring a default constructor in `PersistentExpressionState`, and providing the definition in the source file.

Differential Revision: https://reviews.llvm.org/D151501

* [InstCombine] Add additional tests for unreachable code (NFC)

* [X86] fold select to mask instructions.

When avx512 is available the lhs operand of select instruction can be
folded with mask instruction, while the rhs operand can't. This patch is
to commute the lhs and rhs of the select instruction to create the
opportunity of folding.

Differential Revision: https://reviews.llvm.org/D151535

* [ValueTracking] Avoid UB in test (NFC)

Don't use br undef, as it is UB.

* [InstCombine] Handle undef when pruning unreachable code

If the branch condition is undef, then behavior is undefined and
neither of the successors are live.

This is to ensure that optimization quality does not decrease
when a constant gets replaced with undef/poison in this context.

* [InstCombine] Add tests for icmp of select fold (NFC)

For D150360.

* [Clang] Correctly handle generic lambda used as default template argument.

Adjust the template pparameter depth when parsing default
template arguments as they may introduce generic lambda whose parameters
are not substituted at the same depth.

Fixes #62611

Reviewed By: erichkeane, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D151342

* [InstCombine] Optimize compares with multiple selects as operands

In case of a comparison with two select instructions having the same
condition, check whether one of the resulting branches can be simplified.
If so, just compare the other branch and select the appropriate result.
For example:

    %tmp1 = select i1 %cmp, i32 %y, i32 %x
    %tmp2 = select i1 %cmp, i32 %z, i32 %x
    %cmp2 = icmp slt i32 %tmp2, %tmp1

The icmp will result false for the false value of selects and the result
will depend upon the comparison of true values of selects if %cmp is
true. Thus, transform this into:

    %cmp = icmp slt i32 %y, %z
    %sel = select i1 %cond, i1 %cmp, i1 false

Differential Revision: https://reviews.llvm.org/D150360

* [AArch64] merge scaled and unscaled zero narrow stores.

This patch fixes a crash when a sclaed and unscaled zero stores are merged.

Differential Revision: https://reviews.llvm.org/D150963

* [libc] Adapt includes after 25174976e19b2ef916bb94f4613662646c95cd46

* [RISCV] Fix typo VLUpperBound to VLEN in SiFive7. NFC.

The scheduler models said VLUpperBound which was a typo and should have
said VLEN. This is a purley cosmetic fix.

Differential Revision: https://reviews.llvm.org/D151506

* [ValueTracking] Avoid optimizing away condition in test (NFC)

This is not what we're interested in testing, and it allows to
essentially optimize away the entire function with more powerful
optimization.

* [InstCombine] Remove instructions in dead blocks during combining

We already do this during initial worklist population. Doing this
as part of primary combining allows us to remove instructions in
blocks that were rendered dead by condition folding within the
same instcombine iteration.

* [gn] attempt to port fe2f0ab37c33

* [RISCV] Revise test coverage for shuffle/insert idiom which become v(f)slide1ups

This fixes a couple mistakes in 0f64d4f877.  In particular, I'd not included a negative test where the slideup didn't write the entire VL, and had gotten all of my 4 element vector shuffle masks incorrect so they didn't match.  Also, add a test with swapped operands for completeness.

The transform is in D151468.

* [gn build] Port 8d0412ce9d48

* [FastISel] Implement translation of entry_value dbg.value intrinsics

For dbg.value intrinsics targeting an llvm::Argument address whose expression
starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein
physical register corresponding to that Argument.

Depends on D151332

Differential Revision: https://reviews.llvm.org/D151333

* [Clang] Convert some tests to opaque pointers (NFC)

* [lldb] Improve error message when evaluating expression when not stopped

When trying to run an expression after a process has existed, you
currently are shown the following error message:

  (lldb) p strlen("")
  error: Can't make a function caller while the process is running

This error is wrong and pretty uninformative. After this patch, the
following error message is shown:

  (lldb) p strlen("")
  error: unable to evaluate expression while the process is exited: the
  process must be stopped because the expression might require
  allocating memory.

rdar://109731325

Differential revision: https://reviews.llvm.org/D151497

* [mlir] teach expensive-checks transform mode about empty handle

The transform dialect interpreter features the expensive-checks mode
that acts as an embedded sanitizer to track use-after-consume of
transform handles. Its logic is based on the relations between payload
operations, which made it silently ignore empty handles that are
consumed. Also catch and report this case because the remaining code may
hit an assertion on attempting to access a consumed handle (that is
removed from the mapping).

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D151560

* [mlir] make `fuse_into_containing_op` preserve the containing op handle

This partially undoes the intent of https://reviews.llvm.org/D151418 by
cheating its way to keep the "containing op" (aka loop) handle read-only
in fusion. It is crucial to do so for composability of tiling and
fusion. Specfically, after the "containing op" handle started being
consumed, it became impossible to perform additional tiling after fusion
except tiling the last-fused op:

  %tiled1, %loop1 = tile %op
  %producer1, %loop2 = fuse %producer into %loop1
  // invalid, because %tiled1 is invalidated by consuming %loop1
  // that points to its parent
  tile %tiled1

or

  %tiled1, %loop1 = tile %op
  %tiled2, %loop2 = tile %tiled1
  %p2 = fuse %producer into %loop1
  // invalid, because %loop2 is invalidated by consuming %loop1
  // that points to its parent
  fuse %p2 into %loop2

The approach here makes creative use of the state extension mechanism to
update the payload operation associted with the operand handle. Further
investigation is necessary to understand if is consistent with the
overall execution model of the transform dialect, but it is crucial to
restore composability ASAP.

Reviewed By: springerm, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D151555

* [MLIR][python bindings] Add TypeCaster for returning refined types from python APIs

depends on D150839

This diff uses `MlirTypeID` to register `TypeCaster`s (i.e., `[](PyType pyType) -> DerivedTy { return pyType; }`) for all concrete types (i.e., `PyConcrete<...>`) that are then queried for (by `MlirTypeID`) and called in `struct type_caster<MlirType>::cast`. The result is that anywhere an `MlirType mlirType` is returned from a python binding, that `mlirType` is automatically cast to the correct concrete type. For example:

```
      c0 = arith.ConstantOp(f32, 0.0)
      # CHECK: F32Type(f32)
      print(repr(c0.result.type))

      unranked_tensor_type = UnrankedTensorType.get(f32)
      unranked_tensor = tensor.FromElementsOp(unranked_tensor_type, [c0]).result

      # CHECK: UnrankedTensorType
      print(type(unranked_tensor.type).__name__)
      # CHECK: UnrankedTensorType(tensor<*xf32>)
      print(repr(unranked_tensor.type))
```

This functionality immediately extends to typed attributes (i.e., `attr.type`).

The diff also implements similar functionality for `mlir_type_subclass`es but in a slightly different way - for such types (which have no cpp corresponding `class` or `struct`) the user must provide a type caster in python (similar to how `AttrBuilder` works) or in cpp as a `py::cpp_function`.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D150927

* [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.

The generic implementation is umin(TC, VF * vscale).

Lowering to vsetvli for RISC-V will come in a future patch.

This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D149916

* [flang] Retain the sign of the argument for the result of fraction(0)

The f18 clause 16.9.80 description of the FRACTION(X) intrinsic states:

    Result Value. The result has the value ....
    If X has the value zero, the result is zero.
    If X is an IEEE NaN, the result is that NaN.
    If X is an IEEE infinity, the result is an IEEE NaN.

This clause does not specify whether fraction(-0.0) should be -0.0 or +0.0.
However, a folded result and a runtime result should be consistent, and
returning -0.0 is more in line with the result for fraction(NaN).

For this test:

  print '(2f6.1)', 0.0, fraction(0.0)
  call f(0.0)
  print '(2f6.1)', -0.0, fraction(-0.0)
  call f(-0.0)
  end

  subroutine f(x)
    print '(2f6.1)', x, fraction(x)
  end

Current output is:

   0.0   0.0
   0.0   0.0
  -0.0  -0.0
  -0.0   0.0

Change that to:

   0.0   0.0
   0.0   0.0
  -0.0  -0.0
  -0.0  -0.0

* [lldb] Remove reproducer test suite (NFC)

The reproducer feature as well as the ability to capture or replay tests
with reproducers was removed. This removes the corresponding CMake
target.

* hwasan: enable mmap interception (no tagging used)

This enables HWASan interception for mmap, to prevent users from allocating in the shadow memory regions. For compatibility, it does not use pointer tagging, nor does it allow MAP_FIXED with a tagged address.

This patch initializes the common interceptors, but that should be a no-op (except for the mmap interceptor), due to the disable-by-default nature of hwasan_platform_interceptors.h (from D150708). As the first patch to utilize this common interceptor machinery for HWASan, it also defines some macros (e.g., COMMON_INTERCEPT_FUNCTION) that will be useful as future interceptors are enabled.

TestCases/Posix/mmap_write_exec.cpp now passes for HWASan.

Reviewed By: kstoimenov, vitalybuka

Differential Revision: D151262

* [Driver][X86] Reject unsupported value for -mabi=

-mabi= was incorrectly claimed before D134671. -mabi=sysv appears to be
somewhat common in open-source packages, even if it was not intended to
be supported by Clang.
(For common options supported by multiple architectures, it's easy to
forget to report an error on unsupported targets. Unfortunately
the driver infrastructure doesn't make this less error-prone.)

On x86, support -mabi=sysv for non-Windows targets and -mabi=ms for Windows,
and remove the spurious -Wunused-command-line-argument warning.

With this change, all popular architectures claim -mabi=, so we don't
have to worry much about -Wunused-command-line-argument for other
architectures.

Differential Revision: https://reviews.llvm.org/D151509

* [mlir] [sparse] [gpu] adding transpose support to spmm spmv

Reviewed By: aartbik, wrengr

Differential Revision: https://reviews.llvm.org/D151259

* Fix wrong error message when compiling C souce code:
Currently emit error for uses_allocators(alloc(traits)):

called object type 'omp_allocator_handle_t' (aka
'enum omp_allocator_handle_t') is not a function or function pointer

To fix this, since "alloc" is Id expresison(spce 5.2), during the parser
(in ParseOpenMP.cpp), using tryParseCXXIdExpression instead of
ParseExpression for C.

Differential Revision: https://reviews.llvm.org/D151517

* [libc++][ci] Install ccache in the Docker image

This will allow using ccache in the jobs that build Clang, which
should speed up those jobs.

Differential Revision: https://reviews.llvm.org/D150907

* [clang-tidy] Optimize misc-confusable-identifiers

Main performance issue in this check were caused by many
calls to getPrimaryContext and constant walk up to declaration
contexts using getParent. Also there were issue with forallBases
that is slow.

Profiled with perf and tested on open-source project Cataclysm-DDA.
Before changes check took 27320 seconds, after changes 3682 seconds.
That's 86.5% reduction. More optimizations are still possible in this
check.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D151051

* SelectionDAG: Teach ComputeKnownBits about VSCALE

This reverts commit 9b92f70d4758f75903ce93feaba5098130820d40.  The issue
with the re-applied change was an implicit truncation due to the
multiplication.  Although the operations were converted to `APInt`, the
values were implicitly converted to `long` due to the typing rules.

Fixes: #59594

Differential Revision: https://reviews.llvm.org/D140347

* [libc++][PSTL] Add a test to make sure that customization points work properly

Reviewed By: #libc, ldionne

Spies: ldionne, libcxx-commits

Differential Revision: https://reviews.llvm.org/D151257

* [lldb][NFCI] Include <cstdio> in SBDefines for FILE * definition

There are a few API headers that use FILE * but do not include the
correct header for their definition. Instead of including <cstdio> in each
of the headers manually, it seems easiest to include it in SBDefines to
get them all at once.

rdar://109579348

Differential Revision: https://reviews.llvm.org/D151381

* [mlir][sparse][gpu] fix merge conflict

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151574

* [SLP][NFC]Add a test for spill cost, NFC.

* [lldb] Pass CMAKE_SYSROOT through to LLDB shell tests

This allows the LLDB Shell tests to succeed in (e.g. CI) environments where
system libraries are provided hermetically as a sysroot.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D151269

* [RISCV] Tighten type constraint for RISCVISD::FCLASS_VL.

* Add fastmath attributes to llvm.call_intrinsic

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D151492

* [RISCV] Remove extra MVT::Other result from creation of RISCVISD::FCLASS_VL.

* [RISCV] Simplify code in LowerIS_FPCLASS. NFC

* [SLP]Fix getSpillCost functions.

There are several issues in the current implementation. The instructions
are not properly ordered, if they are placed in different basic blocks,
need to reverse the order of blocks. Also, need to exclude
non-vectorizable nodes and check for CallBase, not CallInst, otherwise
invoke calls are not handled correctly.

* [libc++][NFC] Add additional test case for modules issue in Objective-C++

Differential Revision: https://reviews.llvm.org/D151467

* [MLIR][python bindings] Fix inferReturnTypes + AttrSizedOperandSegments for optional operands

Right now `inferTypeOpInterface.inferReturnTypes` fails because there's a cast in there to `py::sequence` which throws a `TypeError` when it tries to cast the `None`s. Note `None`s are inserted into `operands` for omitted operands passed to the generated builder:

```
    operands.append(_get_op_result_or_value(start) if start is not None else None)
    operands.append(_get_op_result_or_value(stop) if stop is not None else None)
    operands.append(_get_op_result_or_value(step) if step is not None else None)
```

Note also that skipping appending to the list operands doesn't work either because [[ https://github.com/llvm/llvm-project/blob/27c37327da67020f938aabf0f6405f57d688441e/mlir/lib/Bindings/Python/IRCore.cpp#L1585 | build generic ]] checks against the number of operand segments expected.

Currently the only way around is to handroll through `ir.Operation.create`.

Reviewed By: rkayaith

Differential Revision: https://reviews.llvm.org/D151409

* [PhaseOrdering] Add test for loop over span with hardened libc++.

Add a slightly reduced test case for a loop iterating over a std::span
with libc++ hardening.

See https://godbolt.org/z/cKerYq9fY.

* [PseudoProbe] Do not force the calliste debug loc to inlined probes from __nodebug__ functions.

For pseudo probes we would like to keep their original dwarf discriminator (either a zero or null) until the first FS-discriminator pass. The inliner is a violation of that, given that it assigns inlinee instructions with no debug info with the that of the callsite. This is being disabled in this patch.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D151568

* Factor out xcrun into a function (NFC)

* Make function static (NFC)

* [lldb] Skip instead of XFAIL TestInteractiveScriptedProcess

The test is failing on x86_64 but passing on arm64. Skip until Ismail
can investigate this further.

* Revert "Make function static (NFC)"

This reverts commit cefd2802aa49274942da87edf5019b5a23315f01.

* Revert "Factor out xcrun into a function (NFC)"

This reverts commit 97ca34996dbe5a61e79d7c559af7b15dc39c08a5.

* [mlir] Use std::optional instead of llvm::Optional (NFC)

This is part of an effort to migrate from llvm::Optional to std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

* [clang-tidy] Check for specific return types on all functions

Extend the check to all functions with return types like
         std::error_code, std::expected, boost::system::error_code, abseil::Status...

         Resolves issue https://github.com/llvm/llvm-project/issues/62884

Reviewed By: PiotrZSL

Differential Revision: https://reviews.llvm.org/D151383

* [HWASan] use hwasan linker for Android 14+

This will allow to compile binaries that use hwasan to run on a
non-HWASan system image.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D151388

* [llvm-debuginfod][NFC] Switch to OptTable

Reviewed By: mysterymath

Differential Revision: https://reviews.llvm.org/D151273

* [Dexter] Don't hardcode x86_64 as the default architecture

Use platform.machine() as the default architecture instead of hardcoding
it to x86_64.

* [clang][modules] NFCI: Distinguish as-written and effective umbrella directories

For modules with umbrellas, we track how they were written in the module map. Unfortunately, the getter for the umbrella directory conflates the "as written" directory and the "effective" directory (either the written one or the parent of the written umbrella header).

This patch makes the distinction between "as written" and "effective" umbrella directories clearer. No functional change intended.

Reviewed By: benlangmuir

Differential Revision: https://reviews.llvm.org/D151581

* Rewrite load-store-vectorizer.

The motivation for this change is a workload generated by the XLA compiler
targeting nvidia GPUs.

This kernel has a few hundred i8 loads and stores.  Merging is critical for
performance.

The current LSV doesn't merge these well because it only considers instructions
within a block of 64 loads+stores.  This limit is necessary to contain the
O(n^2) behavior of the pass.  I'm hesitant to increase the limit, because this
pass is already one of the slowest parts of compiling an XLA program.

So we rewrite basically the whole thing to use a new algorithm.  Before, we
compared every load/store to every other to see if they're consecutive.  The
insight (from tra@) is that this is redundant.  If we know the offset from PtrA
to PtrB, then we don't need to compare PtrC to both of them in order to tell
whether C may be adjacent to A or B.

So that's what we do.  When scanning a basic block, we maintain a list of
chains, where we know the offset from every element in the chain to the first
element in the chain.  Each instruction gets compared only to the leaders of
all the chains.

In the worst case, this is still O(n^2), because all chains might be of length
1.  To prevent compile time blowup, we only consider the 64 most recently used
chains.  Thus we do no more comparisons than before, but we have the potential
to make much longer chains.

This rewrite affects many tests.  The changes to tests fall into two
categories.

1. The old code had what appears to be a bug when deciding whether a misaligned
   vectorized load is fast.  Suppose TTI reports that load <i32 x 4> align 4
   has relative speed 1, and suppose that load i32 align 4 has relative speed
   32.

   The intent of the code seems to be that we prefer the scalar load, because
   it's faster.  But the old code would choose the vectorized load.
   accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not
   even call into TTI to get the relative speed), because the scalar load is
   aligned.

   After this patch, we will prefer the scalar load if it's faster.

2. This patch changes the logic for how we vectorize.  Usually this results in
   vectorizing more.

Explanation of changes to tests:

 - AMDGPU/adjust-alloca-alignment.ll: #1
 - AMDGPU/flat_atomic.ll: #2, we vectorize more.
 - AMDGPU/int_sideeffect.ll: #2, there are two possible locations for the call to @foo, and the pass is brittle to this.  Before, we'd vectorize in case 1 and not case 2.  Now we vectorize in case 2 and not case 1.  So we just move the call.
 - AMDGPU/adjust-alloca-alignment.ll: #2, we vectorize more
 - AMDGPU/insertion-point.ll: #2 we vectorize more
 - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117d476, which appear to have hit the bug from #1)
 - AMDGPU/multiple_tails.ll: #1
 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above).
 - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like #2, we vectorize more.
 - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above).
 - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split)
 - X86/correct-order.ll: #2, we vectorize more, probably because of changes to the chain-splitting logic.
 - X86/subchain-interleaved.ll: #2, we vectorize more
 - X86/vector-scalar.ll: #2, we can now vectorize scalar float + <1 x float>
 - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical.  It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0.
 - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll

Differential Revision: https://reviews.llvm.org/D149893

* [Driver] Reject AIX-specific link options on non-AIX targets

Follow-up to D106688 and D146431.

-b leads to a -Wunused-command-line-argument warning instead of an error
without linking phase.
-mxcoff-build-id= leads to a -Wunused-command-line-argument warning instead of
an error.

* [clang][modules] NFCI: Use `DirectoryEntryRef` for umbrella directory

This removes some deprecated uses of `DirectoryEntry::getName()`.

Depends on D151581.

Differential Revision: https://reviews.llvm.org/D151584

* Fix test failure after 2be0abb7fe7 (caused by bad merge, sorry).

* Revert "[lldb] Disable variable watchpoints when going out of scope"

Reverting https://reviews.llvm.org/D151366 until Ismail has a chance
to look at the ubuntu CI test failures and can reland.

This reverts commit 7c847ac4bd1bd8a89c7fbb4581328fa8cb0498f1.

* [Dexter] XFAIL Dexter tests for Apple Silicon (arm64)

* Fix runtime crash inside __kmpc_init_allocator

It seems load of traits.addr should be passed in runtime call.  Currently
the load of load traits.addr gets passed cause runtime to fail.

To fix this, skip the call to EmitLoadOfScalar for extra load.

Differential Revision: https://reviews.llvm.org/D151576

* Fix -Wsign-compare from D149893.

* [mlir][spirv] Enhance folding capability of spirv::CompositeExtractOp::fold

This PR improves the `spirv::CompositeExtractOp::fold` function by adding a backtracking mechanism.
The updated function can now traverse a chain of `CompositeInsertOp`s to find a match.

Patch By: nbpatel
Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D151536

* skip test run on amdgcn-amd-amdhsa

* [RISCV] Add isel patterns to form tail undisturbed vfwadd.wv from fpextend_vl+vfwadd_vl+vp_merge.

We use a special TIED instructions for vfwadd.wv to avoid an
earlyclobber constraint preventing the first source and the destination
from being the same register.

This prevents our normal post process for forming TU instructions.
Add manual isel pattern instead. This matches what we do for FMA
for example.

* [mlir][spirv][NFC] Clean up SPIR-V canonicalization

Follow best practices. Use llvm helper functions for readability.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D151600

* [Vectorize] Fix a warning

This patch fixes:

  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:1429:23:
  error: comparison of integers of different signs: 'int' and 'const
  size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare]

* [Vectorize] Fix warnings

This patch fixes:

  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:140:20: error:
  unused function 'operator<<' [-Werror,-Wunused-function]

  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:176:6: error:
  unused function 'dumpChain' [-Werror,-Wunused-function]

* [MemProf] Clean up MemProf instrumentation pass invocation

First, removes the invocation of the memprof instrumentation passes from
the end of the module simplification pass builder, where it doesn't
really belong. However, it turns out that this was never being invoked,
as it is guarded by an internal option not used anywhere (even tests).

These passes are actually added via clang under the -fmemory-profile
option. Changed this to add via the EP callback interface, similar to
the sanitizer passes. They are added to the EP for the end of the
optimization pipeline, which is roughly where they were being added
already (end of the pre-LTO link pipelines and non-LTO optimization
pipeline).

Ideally we should plumb the output file through to LLVM and set it up
there, so I have added a TODO.

Differential Revision: https://reviews.llvm.org/D151593

* [MLIR] Add native Bytecode support for properties

This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.

The scheme relies on a new section where properties are stored in sequence

  { size, serialize_properties }, ...

The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.

This is a re-commit of 837d1ce0dc which conflicted with another patch upgrading
the bytecode and the collision wasn't properly resolved before.

Differential Revision: https://reviews.llvm.org/D151065

* [CodeGen][NFC] Declare copy constructor & copy assignment as deleted for ScheduleDAG

ScheduleDAG has derived classes ScheduleDAGVLIW and ScheduleDAGRRList,
which own resources that are freed in their destructors. Static analyzer
warns b/c they do not have user-written copy constructors.

According to the design of ScheduleDAG, it seems that it should always
be passed by reference. So I declare them as deleted in this patch.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D151538

* [mlir] Fix non-const lvalue reference to type 'uint64_t' cannot bind to type 'size_t' error (NFC)

/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1007:39: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
    if (failed(propReader.parseVarInt(count)))
                                      ^~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
  LogicalResult parseVarInt(uint64_t &result) {
                                      ^
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1020:44: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
      if (failed(offsetsReader.parseVarInt(dataSize)) ||
                                           ^~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
  LogicalResult parseVarInt(uint64_t &result) {
                                      ^
2 errors generated.

* [Driver][test] Replace legacy -target with --target=

* [hwasan] support hwasan-match-all-tag flag for callback memory access instrumentation

Currently, hwasan-match-all-tag flag is supported in inline memory access instrumentation and outline memory access instrumentation, but not supported in callback memory access instrumentation.

- For inline memory access instrumentation: a hwasan-match-all-tag check is added following the tag-mismtach check, if tag from pointer is mismatched with tag from shadow memory and tag from pointer is not equal with hwasan-match-all-tag, then a tag-mismatch will be report.
- For outline memory acess instrumentation: MatchAllTag is encoded in AccessInfo, when emit HWASAN memaccess symbols, asm-printer emits assembly instructions to check if tag from pointer is equal with hwasan-match-all-tag.
- For callback memory access instrumentation: hwasan-match-all-…
Noxime pushed a commit to Noxime/llvm-project that referenced this pull request Jun 16, 2023
The proposal of '.option arch' directive is riscv-non-isa/riscv-asm-manual#67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515
vlc-mirrorer pushed a commit to videolan/dav1d that referenced this pull request Feb 21, 2024
Support for '.option arch' directive [0] was added to binutils in
d3ffd7f77654adafe5f1989bdfdbe4a337ff2e8b [1] and in llvm in
9e8ed3403c191ab9c4903e8eeb8f732ff8a43cb4 [2].

[0] riscv-non-isa/riscv-asm-manual#67
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d3ffd7f77654adafe5f1989bdfdbe4a337ff2e8b
[2] llvm/llvm-project@9e8ed34
veselypeta pushed a commit to veselypeta/cherillvm that referenced this pull request Aug 26, 2024
The proposal of '.option arch' directive is riscv-non-isa/riscv-asm-manual#67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants