Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tricore register 0 asserts #2474

Closed
trufae opened this issue Sep 8, 2024 · 18 comments · Fixed by #2523
Closed

tricore register 0 asserts #2474

trufae opened this issue Sep 8, 2024 · 18 comments · Fixed by #2523
Labels
bug Something is not working as it should TriCore Arch
Milestone

Comments

@trufae
Copy link
Contributor

trufae commented Sep 8, 2024

Work environment

Questions Answers
OS/arch/bits Debian arm 64, MacOS AArch64, MacOS x86, Windows x86 etc.
Architecture ppc, x86, cortexm, armv8 etc.
Source of Capstone git clone, brew, pip, release binaries etc.
Version/git commit v5.0.3,

Instruction bytes giving faulty results

tricore disassembly doesnt match metadata inside. this is the disassembly:

252$ ./cstool/cstool tc162  9a000028
 0  9a 00  add	d15, d0, #0
0$ ./cstool/cstool -d tc162  9a000028
 0  9a 00  add	d15, d0, #0
	ID: 31 (add)
	op_count: 2
		operands[0].type: REG = d0
			.access: WRITE
		operands[1].type: IMM = 0x0
			.access: READ
	Registers modified: d0

as you can see there are 3 operands, 1st and 2nd are registers, 3rd is immediate. but decoding shows only two.

Expected results

things not asserting if i parse the 2nd operand as a register

Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.

Steps to get the wrong result

With r2:

[0x00000000]> wx 9a000028
[0x00000000]> pd 2
Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.
Process 30566 stopped

this is not crashing with cstool because its ignoring the 2nd operand.

@cqke
Copy link

cqke commented Sep 8, 2024

Pushing up this issue, breaking r2 TriCore features part !
Thanks guys !

--
EDIT
the instruction add d15, d0, #0 is not valid for the TriCore architecture (TC1767 as ex)

In the TriCore architecture, the add instruction typically involves register-to-register operations and does not directly support an immediate value (like #0).

Maybe if we try with another instruction sets versio ?

EDIT2 It seems that changing tc version doesn't change anything. In tc162 IS, we can do an Add d0, d1, #0

@Rot127
Copy link
Collaborator

Rot127 commented Sep 8, 2024

@imbillow Could you take a look at this one please?

@Rot127 Rot127 added bug Something is not working as it should TriCore Arch labels Sep 9, 2024
@Rot127 Rot127 added this to the v6 - Alpha milestone Sep 9, 2024
@cqke
Copy link

cqke commented Sep 9, 2024

@Rot127 do you think it's possible to hotfix it in V5 too ? Thanks

@Rot127
Copy link
Collaborator

Rot127 commented Sep 9, 2024

Yes of course! Actually, let me change the milestone to v5.

@Rot127 Rot127 modified the milestones: v6 - Alpha, v5.0.4 Sep 9, 2024
@trufae
Copy link
Contributor Author

trufae commented Sep 9, 2024

more wrong stuff

$ ./cstool/cstool -d tc162 da00a0f5
 0  da 00  mov	d15, #0
	ID: 233 (mov)
	op_count: 1
		operands[0].type: IMM = 0x0
			.access: READ
	Groups: (null)

 2  a0 f5  mov.a	a5, #0xf
	ID: 230 (mov.a)
	op_count: 2
		operands[0].type: REG = a5
			.access: WRITE
		operands[1].type: IMM = 0xf
			.access: READ
	Registers modified: a5

0$

@csarn
Copy link

csarn commented Sep 16, 2024

Looking at the tricore instruction set docs, it seems that capstone just does not consider registers that are implicit in the opcode as operands.
i.e. your first example, "9a000028" is actually just a 16-bit instruction "9a00" followed by an invalid instruction.
In "9a00", "9a" is the opcode for an operation that adds a 4-bit constant to a data register and stores the result in d15. The d15 output register is not encoded as an "f" anywhere, but is just the implicit output of the instruction with that opcode (9a):
image

Capstone then also does not show "d15" as an operand.
The same thing is happening with the "more wrong stuff" examples.

I also would like to have this bug fixed. It leads to wrong information, like showing no written registers if the output registers are implicit. When searching for instructions that modify a specific register, I need that information to be available in the disassembly.

@imbillow
Copy link
Contributor

@Rot127 @trufae

This is mostly about what to do with those implicit register accesses.
If we want to add those implicit register accesses to the instruction meta information then I need to do quite a bit of special handling. If it's just turning off incorrect assertions then that's pretty simple.
So what should I do?

@csarn
Copy link

csarn commented Sep 19, 2024

So from a user's perspective it would be best to add the meta-information to the instructions/opcodes.
Is there maybe something similar in other supported architectures, where some of the "special handling" code could be copied/adjusted from?
In x86, push cs would be an example:

$ cstool -d x32 0e
 0  0e                                               push       cs
        ID: 609 (push)
        Prefix:0x00 0x00 0x00 0x00 
        Opcode:0x0e 0x00 0x00 0x00 
        rex: 0x0
        addr_size: 4
        modrm: 0x0
        disp: 0x0
        sib: 0x0
        op_count: 1
                operands[0].type: REG = cs
                operands[0].size: 2
        Groups: not64bitmode 

Here capstone tells us that there is one operand, the cs register, which is implicit here.

@Rot127
Copy link
Collaborator

Rot127 commented Sep 20, 2024

@imbillow Sorry, for answering so late.

So mov.a is defined here: https://github.com/TriDis/llvm-tricore/blob/4bfc5ee073becdcf799978431a9c03045b1091a2/lib/Target/TriCore/TriCoreInstrInfo.td#L547

If all mov instructions implicitly set the d15 register, they should be added there like this (also check the parent or child classes):

let Defs = [D15], Uses = [] in {
  class MOV_RR<bits<8> op1, bits<8> op2, string opstr,
  RegisterClass outregClass, RegisterClass inregClass>
...
}

If you generate the tables again, you get implicitly defined regs.
Also you can open a PR with those changes in the TriDis repo.

The generated tables changed a little. You maybe have to apply only relevant changes. E.g. like this:

git diff -U0 | grepdiff -E '<PATTERN>' --output-matching=hunk | git apply --cached --unidiff-zero

@Rot127
Copy link
Collaborator

Rot127 commented Sep 20, 2024

@csarn I am not sure if you understand you correctly. So here the general operand classification for Auto-Sync archs (TriCore is one of them):

  • explicit operands: Any operand you see in the asm text should be in the details. Only if you choose to get the real operands for an alias instruction (via cs_tool -r or CS_OPT_DETAIL_REAL) the asm text and detail operands differ.
  • implicit operands: Anything effectively used by the instruction, but not shown in the alias or real asm text (as this D15 access from above). Those ones are in cs_detail->regs_read/write.

If there is an instruction which doesn't give these results, it is considered a bug.

@csarn
Copy link

csarn commented Sep 20, 2024

@Rot127 Ok, then I mis-used the word "implicit".
All the examples (tricore, and the x86 "push cs") have the relevant operands explicit in the asm text.
I was thinking on opcode level, where the encoded instruction would have no bits indicating the operand, because for that operand there is a special opcode.

So you are confirming that this issue is actually a bug.
Going back to the first example:

cstool -d tc162 9a00
 0  9a 00  add  d15, d0, #0
        ID: 31 (add)
        op_count: 2
                operands[0].type: REG = d0
                        .access: WRITE
                operands[1].type: IMM = 0x0
                        .access: READ
        Registers modified: d0

the disassembled asm string "add d15, d0, #0" is correct, and shows 3 operands.
The details only show the last two of those, "d15" is missing.

@Rot127
Copy link
Collaborator

Rot127 commented Sep 20, 2024

Yes, it is a bug.
@imbillow Sorry, I didn't look at the code so far. Just recognized with the last comment what you meant. I would need to check. But if the Printer just prints the hard-coded string of the register name (e.g. "d15") you would need to add it in some fixup function.

I can take a look tomorrow and give you more details.

@Rot127
Copy link
Collaborator

Rot127 commented Sep 21, 2024

@imbillow Just checked it. Yeah this is this annoying problem of people hard coding operands in the mnemonic.
Check out functions AArch64_insert_detail_op_reg_at() and how it is used. You can do something like this.

Rot127 added a commit to Rot127/capstone that referenced this issue Oct 6, 2024
@Rot127
Copy link
Collaborator

Rot127 commented Oct 6, 2024

@imbillow I started an attempt for this one here: #2502
There are problems though. While we can figure out the registers at position >0 (by checking the bits as in the PR), we cannot easily figure out the d15 and a15 regs at position 0.
Because they are emitted with the mnemonic. So no bits indicate their presence.

Also the attempt in the PR is flawed, because it might add registers also to instructions which have randomly the correct bits set at these positions.

Now, I would propose to fix for these instructions the td files.
I think we can add the d15/a15 register as implicit write. Then check in the fixup function, if d15/a15 is in the implicit write list AND in the asm string.
If yes, we remove it from the list and add it as register at an index.
We can determine the index by counting the ,.

Super ugly and resource intense because we check strings. For every instructions. But I cannot come up with another idea currently.
Except we want to go deep into LLVM logic, but I have not time for this unfortunately.

Another problem is the missing register access information. We maybe have to do a string search in the mnemonic again?

Or is there a way to check the instruction encoding? Do you know this?

@Rot127
Copy link
Collaborator

Rot127 commented Oct 6, 2024

I just saw you've solved the problem with the RzIL uplifting. So we could just copy the distinction from there.

@imbillow
Copy link
Contributor

But TriCore's RzIL code is also pretty much re-disassembled.

I think it might be more elegant to edit tricore's TableGen, if feasible.

@imbillow
Copy link
Contributor

However, these implicit registers may also be present in an memory operand.

image

In this case the metadata is completely wrong.

cstool -d tc162 c800
 0  c8 00  ld.a a0, [a15]#0
        ID: 165 (ld.a)
        op_count: 1
                operands[0].type: MEM
                        .mem.base: REG = a0
                        .mem.disp: 0x0
                        .access: WRITE
        Registers read: a0
        Registers modified: a0
        Groups: (null) 

@Rot127
Copy link
Collaborator

Rot127 commented Oct 17, 2024

Yeah, I figured this out later as well. Guess we really need to fix it in the td files.

@imbillow imbillow mentioned this issue Oct 22, 2024
5 tasks
@kabeor kabeor closed this as completed in f6f9679 Nov 1, 2024
kabeor added a commit that referenced this issue Nov 24, 2024
* Update changelog for V6.0.0-Alpha1 (#2493)

* update version to v6-alpha1

* update bindings const values

* Update changelog for V6.0.0-Alpha1

* Remove irrelevant changes. (#2495)

* Fixing UB santizer, `LITBASE` and assert errors. (#2499)

* Update labeler with Xtensa and v6 files. (#2500)

* Add hard asserts to all SStream functions and memset MCInst. (#2501)

* Only trigger on released action. (#2497)

* Fix cstest build with Ninja (#2506)

* Tricore EA calculation (#2504)

* Update libcyaml dependency in cstest to 1.4.2 (#2508)

* AArch64: Replace vararg add_cs_detail by multiple concrete functions

Fixes UB caused by various mismatches on how these arguments are passed
and read. This became visible when running on PowerPC hosts with e.g.
`cstool -d aarch64 204862f8`.
Apart from the UB fix, this is meant to be a pure refactor.

Partially addresses #2458

* xtensa: Fix Branch Target (#2516)

* xtensa: Fix Branch Target

* auto-sync: fix byte pattern

* xtensa: add branch insn tests

* Revert "auto-sync: fix byte pattern"

This reverts commit cf8e870.

* Fix #2509. (#2510)

Compatibility headers should always include the header in the same dir.

* Fix stringop-truncation warning some compilers raise. (#2522)

* Add CC and VAS compatibility macros (#2525)

* Fix endianess issue during assignment. (#2528)

* This time actually fix big endian issue. (#2530)

* tricore: fixes #2474 (#2523)

* tricore: fix auto-sync tricore

* tricore: fixes TriCoreGenCSMappingInsnName.inc

* tricore: fixes

* tricore: try fix ld.a SC

* tricore: fixes all

* Add TriCore to .github/workflows/auto-sync.yaml

* Add TriCore details tests(a15, d15, a10|sp)

* Change CI to create Debian Package to Release (#2521)

* Updating CI to create Debian package and version is assigned by tag
version. Also updating release CI to not use end-of-life workflows

* Clear up usage of static libraries.

- Python bindings only use the dynamic lib. But built and copied the static ones sometimes nonetheless.
- Add toggles to build only static, static/dyn or only dynamic.

---------

Co-authored-by: Rot127 <unisono@quyllur.org>

* Rename build arguments: (#2534)

- BUILD_SHARED_LIBS -> CAPSTONE_BUILD_SHARED_LIBS
- BUILD_STATIC_LIBS -> CAPSTONE_BUILD_STATIC_LIBS
- BUILD_STATIC_LIBS -> CAPSTONE_BUILD_STATIC_MSVC_RUNTIME

* xtensa: update to espressif/llvm-project (#2533)

* fix coverity (#2546)

- cid 514642

- cid 514643

- cid 514644

- cid 514645

* Move debian package generation to a dispatch only workflow (#2543)

* Move deb package gen files int package/deb

* Fix basename check

* Make debian package generation dispatch only

* Python package building rework (#2538)

* - Refactored setup.py to remove hacks regarding packaging of wheels for different platforms, improve and cleanup the code
- Updated README.txt
- Removed old Makefile and build_wheel.sh scripts
- Created a new workflow that takes care of building and testing python packages for different platforms/architectures/python versions

* Added SPDX headers to the setup.py

* - cstest_py: Fixed positional argument since it doesn't accept a `required` flag. It turns to have a mandatory tests folder path
- integration_tests.py: Use pathlib to determine the required path
- GitHub action: Simplified the tests execution command

* GitHub Actions: Run python 3.8 (lowest) and 3.13 (current highest) for native runners only during testings and the rest during tag release

* GitHub Action:
- Fixed the cibw_build matrix element
- Added a step to prepare artifact name

* GitHub Action: Added run_tests.py script to run all tests during CI workflow

* - Added SPDX headers to the run_tests.py script and to the build-wheels-publish.yml workflow file
- Minor fixes to the workflow as pointed out in the PR review
- Updated MANIFEST.in to reflect the actual libraries built during python wheel creation process
- Use subprocess.run in place of os.system in run_tests.py script

* GitHub Action:
- Run qemu step only if non-native Linux runner
- Added arch:universal2 matrix element for macos-latest runner

* Python bindings: Refreshed the list of files needed to be copied for sdist archive

* GitHub Action: Commented out arch:x86 matrix elements

* GitHub Action: Run qemu step only if non-native Linux runner

* GitHub Action: Minor fixes

* Python bindings: Added missing .in pattern when collecting src files for sdist archive

* Auto-Sync reproducability + ARM update (#2532)

* fix xtensa DecodeMR23RegisterClass and add tests for MAC16 instru… (#2551)

* fix xtensa `DecodeMR23RegisterClass` and add tests for `MAC16` instructions

* revert

* Prepare for update (#2552)

* Bindings(chore): Fix DeprecationWarning

* Version(upgrade): update bindings const

* Fix(chore): Fix ARMCC_Invalid is not defined

* Update Changelog Version to 6.0.0-Alpha2 (#2553)

* Bindings(chore): Fix DeprecationWarning

* Version(upgrade): update bindings const

* Fix(chore): Fix ARMCC_Invalid is not defined

* Changelog: Update to version 6.0.0-Alpha2

---------

Co-authored-by: Rot127 <45763064+Rot127@users.noreply.github.com>
Co-authored-by: Florian Märkl <info@florianmaerkl.de>
Co-authored-by: billow <billow.fun@gmail.com>
Co-authored-by: Andrew <afq2101@columbia.edu>
Co-authored-by: Rot127 <unisono@quyllur.org>
Co-authored-by: @Antelox <anteloxrce@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working as it should TriCore Arch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants