Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate architecture synchronization with latest LLVM #1803

Closed
wants to merge 19 commits into from

Conversation

Phosphorus15
Copy link

What is Auto-Sync ?

Capstone Auto-Sync is an initiative to partly automate the synchronization of certain architectures to the latest.

Most of the Capstone's .inc files a generated from LLVM's TableGen backend and processed by python scrips in suite/synctools into C-compatible files, which leads to the problem that with LLVM's update, it's not always (hardly, in fact) possible to use the synctools without patch in regard to LLVM's upstream change.

This syncing tools, however, using a custom-made LLVM TableGen backend (here) to generate .inc files natively usable by Capstone. With certain adaptations in Capstone's structure, it is possible to consistently automate large parts of the work on keeping up with LLVM's latest Target (i.e. .td files) update, and optimally, there could be zero-overhead in the process of updating(see this patch on missing bcxf instructions)

https://github.com/Phosphorus15/Capstone/blob/1bca4211bd7c132c57d0006a26de57d42a4bcdb9/sync/SYNCING.md?plain=1#L1-L17

What is done in this PR?

In this PR, we adapted following architectures with the Auto-Sync structure base on the next branch, and made sure they pass the test suite given by Capstone, and also by the analysis tools rizin (which also uses Capstone):

  • Mips
  • ARM
  • AArch64
  • Riscv
  • PowerPC
  • Sparc
  • SystemZ
  • XCore

What is yet to be done?

Capstone's instructions mapping info (like enums) exposed to various bindings (Python bindings, etc.) are not affected by Auto-Sync procedure, and so have to be manually edited if anything like new instructions/operands type was added by LLVM.

Phosphorus15 and others added 19 commits November 30, 2021 16:13
MIPS Mapper Integration

unify format style, closes rizinorg#2

Restored readme.md location

ARM Arch & Synctools Usage Docs

fix redundant header

auto sync MIPS integration

apply clang-format

ignore .idea folder

apply `black` reformat
SystemZ & XCore & Sparc & RISCV & PPC

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Sparc PPC XCore SystemZ

RISCV Instructions Printing & Mapping

AArch64&Rizin disasm passing (with some target-specific generated table still needed - left unchecked for now)

ARM&Rizin disasm passing

Mips&Rizin disasm passing
Update generated files to match newest generator

Invocation simplification, w.r.t. rizinorg#5

Turn on the  build target
Inst Printer Predicate fix

architecture integrity check & ppc naming patch

fixes mips naming & addressing issues

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Cleanups & warning elimination

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Minor syncing README updates
@aquynh
Copy link
Collaborator

aquynh commented Dec 8, 2021 via email

@Phosphorus15
Copy link
Author

Phosphorus15 commented Dec 8, 2021

Very cool! Did you try this on the latest LLVM version, and was anything broken?

Actually yes, the generated .inc files were based on the latest commit of llvm-project, and we tested it against cstest so it works very well on most test cases, except that LLVM drops PowerPC QPX support (so this table could not be gen-ed), and a couple of thumb2 instructions are missing.

@aquynh
Copy link
Collaborator

aquynh commented Dec 8, 2021 via email

@aquynh
Copy link
Collaborator

aquynh commented Dec 9, 2021

did you base this PR on some development snapshot of llvm?

since development snapshot of llvm may include some bugs or WIP, I think it is better to base on stable version, which is llvm 13.0 at the moment. i am just wondering if that is a lot of work to PR based that version now?

@@ -891,8 +896,9 @@ size_t CAPSTONE_API cs_disasm(csh ud, const uint8_t *buffer, size_t size, uint64
mci.flat_insn->mnemonic[0] = '\0';
mci.flat_insn->op_str[0] = '\0';
#endif

debugln("trying into dissasm");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the debug code?

@aquynh
Copy link
Collaborator

aquynh commented Dec 9, 2021

please move sync/ directory into suite/

@XVilka
Copy link
Contributor

XVilka commented Dec 9, 2021

@Phosphorus15 could you please address these warnings too?

In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157863:19: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printZPRasFPR(MI, OpIdx, OS, 128);
                  ^~
/src/capstonenext/arch/AArch64/AArch64InstPrinter.c:187:35: note: passing argument to parameter 'MI' here
static void printZPRasFPR(MCInst *MI, unsigned OpNum, SStream *O, int Width);
                                  ^
In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157867:19: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printZPRasFPR(MI, OpIdx, OS, 32);
                  ^~
/src/capstonenext/arch/AArch64/AArch64InstPrinter.c:187:35: note: passing argument to parameter 'MI' here
static void printZPRasFPR(MCInst *MI, unsigned OpNum, SStream *O, int Width);
                                  ^
In file included from /src/capstonenext/arch/AArch64/AArch64InstPrinter.c:225:
/src/capstonenext/arch/AArch64/AArch64GenDisassemblerTables.inc:157871:27: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printMatrixTileVector(MI, OpIdx, OS, 0);

/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:5847:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:7664:35: note: expanded from macro 'DecodeToMCInst'
    tmp |= fieldname(insn, 16, 5) << 34;\
                                  ^  ~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:7673:35: note: expanded from macro 'DecodeToMCInst'
    tmp |= fieldname(insn, 16, 5) << 34;\
                                  ^  ~~
/src/capstonenext/arch/PowerPC/PPCGenDisassemblerTables.inc:8244:1: warning: shift count >= width of type [-Wshift-count-overflow]
DecodeToMCInst(decodeToMCInst, fieldFromInstruction, uint32_t)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/src/capstonenext/arch/Mips/MipsInstPrinter.c:33:38: note: passing argument to parameter 'MI' here
static void printUnsignedImm(MCInst *MI, int opNum, SStream *O);
                                     ^
In file included from /src/capstonenext/arch/Mips/MipsInstPrinter.c:102:
/src/capstonenext/arch/Mips/MipsGenDisassemblerTables.inc:57288:21: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printMemOperand(MI, OpIdx, OS, "");
                    ^~
/src/capstonenext/arch/Mips/MipsInstPrinter.c:90:37: note: passing argument to parameter 'MI' here
static void printMemOperand(MCInst *MI, int opNum, SStream *O, char *);
                                    ^
/src/capstonenext/arch/Mips/MipsInstPrinter.c:185:9: warning: initializing 'char *' with an expression of type 'const char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  char *Name = Mips_reg_name(0, RegNo);
        ^      ~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
/src/capstonenext/arch/Sparc/SparcInstPrinter.c:62:53: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  unsigned Imm = MCOperand_getImm(MCInst_getOperand(MI, opNum));
                                                    ^~
/src/capstonenext/arch/Sparc/../../MCInst.h:137:38: note: passing argument to parameter 'inst' here
MCOperand *MCInst_getOperand(MCInst *inst, unsigned i);
                                     ^
1 warning generated.
In file included from /src/capstonenext/arch/SystemZ/SystemZInstPrinter.c:460:
/src/capstonenext/arch/SystemZ/SystemZGenDisassemblerTables.inc:60029:23: warning: passing 'const MCInst *' (aka 'const struct MCInst *') to parameter of type 'MCInst *' (aka 'struct MCInst *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
    printU4ImmOperand(MI, OpIdx, OS);
                      ^~
/src/capstonenext/arch/SystemZ/SystemZInstPrinter.c:163:39: note: passing argument to parameter 'MI' here
static void printU4ImmOperand(MCInst *MI, int OpNum, SStream *O) {
                                      ^
1 warning generated.
[ 33%] Building C object CMakeFiles/capstone-static.dir/arch/XCore/XCoreDisassembler.c.o
In file included from /src/capstonenext/arch/XCore/XCoreDisassembler.c:45:
In file included from /src/capstonenext/arch/XCore/CapstoneXCoreModule.h:120:
/src/capstonenext/arch/XCore/XCoreGenDisassemblerTables.inc:2624:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
[ 63%] Linking C executable fuzz_disasm
/usr/bin/ld: libcapstone.a(AArch64InstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(MipsInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(PPCInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(SparcInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(SystemZInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(RISCVInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
/usr/bin/ld: libcapstone.a(MCInstPrinter.c.o):(.bss+0x0): multiple definition of `MRI'; libcapstone.a(ARMInstPrinter.c.o):(.bss+0x0): first defined here
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/fuzz_disasm.dir/build.make:130: fuzz_disasm] Error 1
make[1]: *** [CMakeFiles/Makefile2:129: CMakeFiles/fuzz_disasm.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

@@ -29,7 +29,7 @@ option(CAPSTONE_BUILD_CSTOOL "Build cstool" ON)
option(CAPSTONE_USE_DEFAULT_ALLOC "Use default memory allocation functions" ON)
option(CAPSTONE_ARCHITECTURE_DEFAULT "Whether architectures are enabled by default" ON)
option(CAPSTONE_DEBUG "Whether to enable extra debug assertions" OFF)
option(CAPSTONE_INSTALL "Generate install target" OFF)
Copy link
Contributor

@mrexodia mrexodia Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done like this on purpose to support FetchContent workflows without doing custom hackery on the cache. Likely a good solution is to set it to CAPSTONE_ROOT_PROJECT (and also set the other sensible defaults):

if(CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_SOURCE_DIR)
	set(CAPSTONE_ROOT_PROJECT ON)
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, the change seems unrelated to the goal of this pr, so it should not be part of it.

@Phosphorus15
Copy link
Author

did you base this PR on some development snapshot of llvm?

since development snapshot of llvm may include some bugs or WIP, I think it is better to base on stable version, which is llvm 13.0 at the moment. i am just wondering if that is a lot of work to PR based that version now?

This surely could be (by rebasing the tablegen backend onto llvm 13.0) done with no much of a problem, just that we'd lose the update of some new instructions on ARMv9 in the lastest dev branch of LLVM.

@Phosphorus15
Copy link
Author

Can you provide some quick summary on what changed for x86, arm & arm64 with this PR? This brings latest instruction update to all of them?

x86 has not been included because it has a quite different tablegen backend implementation than other archs., also what is the change specifically, instructions change or changes on Capstone code structure? I can write a small documentation about it.

@XVilka
Copy link
Contributor

XVilka commented Dec 13, 2021

It also should be rebased since there are now conflicts, @Phosphorus15

@kabeor
Copy link
Member

kabeor commented Dec 13, 2021

@XVilka emm..PR to 'next' branch is right, why need to rebase?

@XVilka
Copy link
Contributor

XVilka commented Dec 13, 2021

@kabeor because there were recent commits in the next, and there is a conflict in arch/AArch64/AArch64InstPrinter.c: https://github.com/capstone-engine/capstone/commits/next

@kabeor
Copy link
Member

kabeor commented Dec 13, 2021

@XVilka we will solve the conflict. 'next' branch is main branch for now.

@kabeor
Copy link
Member

kabeor commented Dec 22, 2021

@Phosphorus15 Hello, is there any progress? or some thing I can help?

@XVilka
Copy link
Contributor

XVilka commented Jan 23, 2022

@Phosphorus15 ping?

@FlyGoat
Copy link

FlyGoat commented Jan 23, 2022

I'm trying to introduce MIPS R6 support and just found this PR had done that :-)

In case the original author stalled, is it possible to get this merged by maintainer edit?

Or I can try to rebase and address these review comments, but how to credit the author properly?

@XVilka
Copy link
Contributor

XVilka commented Jan 23, 2022

@FlyGoat I think you can do a new PR, it would be the best way to handle this. You can just keep original copyrights and add yours. You also could link to the original PR. Since the patched LLVM resides in our repository, feel free to send PRs here as well: https://github.com/rizinorg/llvm-capstone

@FlyGoat
Copy link

FlyGoat commented Jan 23, 2022

@XVilka Thanks for the information. I realized that a huge amount of diffs in this patch is doing clang-format so I would like to get a confirmation from maintainers before I mess with them.
@aquynh
Is formatting necessary, or welcomed?

@kabeor
Copy link
Member

kabeor commented Jan 24, 2022

@FlyGoat Hello, we must keep the original code style, so we won't accept code formatting changes in this PR.

In this PR, what we need to do are:

  1. create a new PR rebase to new branch based on next branch, we're considering merging it until it's stable.
  2. restore all code formatting changes
  3. do not add any .inc file in this PR, we should PR that alone
  4. the code should based on LLVM latest released version, but not latest commit(because there could be more bugs there)

@XVilka For more convenient maintenance in the future, we really hope to move llvm-capstone repo to Capstone Organization. Maybe we can talk more about this.

@XVilka
Copy link
Contributor

XVilka commented Jan 24, 2022

@kabeor yes, of course, that was the original intention once the PR is in the better shape to get merged.

@kabeor
Copy link
Member

kabeor commented Jan 24, 2022

@kabeor yes, of course, that was the original intention once the PR is in the better shape to get merged.

Cool!

@FlyGoat FlyGoat mentioned this pull request Jan 25, 2022
@XVilka
Copy link
Contributor

XVilka commented Jan 26, 2022

@kabeor I transferred the repository to you since cannot transfer to the capstone-engine directly.

@kabeor
Copy link
Member

kabeor commented Jan 26, 2022

@XVilka ok, I'll transfer now, thx

@kabeor
Copy link
Member

kabeor commented Jan 26, 2022

@kabeor
Copy link
Member

kabeor commented Jan 26, 2022

@XVilka hi, Would you like to maintain llvm-capstone continuing with us together? I can give you guys the access.

@FlyGoat
Copy link

FlyGoat commented Jan 26, 2022

For llvm-capestone repo I guess what we need to do is squash TableGen backend into a commit and rebase to llvm-13 stable branch?

@kabeor
Copy link
Member

kabeor commented Jan 26, 2022

@FlyGoat yes

@XVilka
Copy link
Contributor

XVilka commented Jan 26, 2022

@kabeor yes, please do.

@kabeor
Copy link
Member

kabeor commented Jan 26, 2022

@kabeor yes, please do.

done :)

@Rot127
Copy link
Collaborator

Rot127 commented Aug 8, 2022

I would take this PR and #1831 over and finish them up. If anyone already started on this as well in the last last 5 months please ping me here.
The plan is to extend it also by the LANAI and LoongArich archs. But of cause the modernization of this PR comes first.

@XVilka
Copy link
Contributor

XVilka commented May 4, 2023

Probably could be closed in favor of #1949
@kabeor

@kabeor kabeor closed this May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants