Update BPF arch #2568

Roeegg2 · 2024-12-03T17:48:16Z

Your checklist for this pull request

I've documented or updated the documentation of every API function and struct this PR changes.
I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

Full details in commit messages.

Resources used:

ref1
ref2
LLVM objdump
Linux BPF disassembler
GNU binutils objdump (has quite a few errors. Only used for syntax reference)

In case of inconsistencies I followed the resources in the order listed

Some important notes:

In general I followed the syntax GNU objdump has (since this is what has been mostly used thus far) but there were cases in which I felt like the syntax was misleading or inaccurate so I modified it a bit:

Instead of movs32/movs (for any s8,s16,s32), movsx\movsx64 was used, where x is the size of the move
Instead of acmp, we use acmpxchg. It's misleading naming it acmp, since this instruction also swaps the data if the comparison is correct.
The reason for using na/64 instead of 32/na is simply because that's what is used so far. Doesn't really matter IMO, (and anyway it makes more sense, since 32 bit is the default instruction class, and 64 are the eBPF added ones.)

The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)

Test plan

All tests green.
...

Closing issues

...

One of the main improvements recent eBPF verifiers have is the ability to jump backwards in code. This means that the OFF field of the opcode can be negative, and thus should be interpreted as signed. eBPF used to have load instructions, specific to loading data from socket buffer. Today, they are deprecated, but it is still good practice to support them, as it's very likely older programs might use them. Some of the newer instructions added by the specification require additional fields to dictate what is the exact operation to be performed (eg. CALL requires 'src' to decide whether 'imm' should interpreted as BTF function ID, or as offset from IP. MOV requires 'off' field in order to decide whether a regular MOV or rather a MOVSX should be performed) * Move BPF opcode setting to the disassembler * Change BPF JMP instruction naming to conventional JA * Fix the sign interpretation of 'off', 'imm', etc * Implement LDIND* and LDABS* legacy packet instructions * Implement JUMP32 instruction class * Move BPF opcode setting to disassembler * Correct and remove branch + load tests, and add more exhaustive ones * Correct and remove malformed LD tests, and add legacy packet tests

* Implement `movs`, `sdiv`, `div` instructions * Implement atomic ALU and complex operations (atomic `add`, `or`, `and`, `xor`, and `cmp`, `cmpxchg` respectively) * Remove outdated `xadd` tests, and add new ones for the new instructions

Rot127

Excellent job! Thanks a lot for all the effort!

I added quite some comments. Most of them are pretty simple.

Additionally to them, please:

Add your copyright in SPDX style to the file headers. E.g.:

// SPDX-FileCopyrightText: 2023 Rot127 <unisono@quyllur.org>
// SPDX-License-Identifier: BSD-3

please run clang-format -i arch/bpf/* on the files.

Please add some tests to cover:

is_signed and is_pkt. You have to add the fields in test_detail_bpf.h/c and details.py::test_expected_bpf().
(You can check what how the imm is hanndled over all files).
Use be tested with compare_tbool(). In the yaml files you set the values with: 1 == true, 0 == unset, -1 == false.
The new groups you added.

I know this is tedious stuff, but doing it now, will save us from many bugs in the future.

I also added quite some comments for using the helper functions we already have.
This is important IMO, because those helper functions are better tested, have more asserts and working with them is easier for other devs.
Because one doesn't have to check how "this specific BPF function" works. Also single point of failure etc.

Regarding the name changes:

The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)

This is ok.

For the others. The problem is usually solved via enabling/disabling features (for x86 for example it is Intel or ATT syntax).

The problem with changing names is, people can't google the mnemonics anymore. But when they attempt to, one can't be sure if they found the correct definition of the instruction or deals with some weird undocumented edge case.
In general it is ok to change syntax for better readability. But please make it opt-in via a flag.
By default the syntax should always be as close as possible to the ISA (or RFC in this case).

You can add a CS_OPT_SYNTAX_BPF_CS in capstone.h::cs_opt_value and handle it in BPF_option().
Check the syntax flag in the printer with MI->csh->syntax & CS_OPT_SYNTAX_BPF_CS and set the alternative mnemonic if true.

To extend cstool, just add the option to cstool.c::all_opts.

arch/BPF/BPFDisassembler.c

arch/BPF/BPFInstPrinter.c

include/capstone/bpf.h

tests/MC/BPF/extended-all.yaml

arch/BPF/BPFDisassembler.c

Rot127 · 2024-12-04T14:23:40Z

Ah, one last request. Please add a short description to the changes you made in docs/cs_v6_release_guide.md::New features. Also document the mnemonic change for the LLVM style load instructions there.

Roeegg2 · 2024-12-04T22:25:00Z

It appears the formatting is wrong even though I ran clang-format?

Rot127

It appears the formatting is wrong even though I ran clang-format?

Most files are not formatted yet. But I try to keep the formatting diversion on an acceptable level. This is why I asked you to only format the bpf files :)

Please revert the formatting in the others.

arch/BPF/BPFDisassembler.c

suite/cstest/include/test_detail_bpf.h

tests/details/bpf.yaml

suite/cstest/src/test_detail_bpf.c

Rot127

Nice, almost done. Please just address the last few comments.

arch/BPF/BPFDisassembler.c

arch/BPF/BPFMapping.c

docs/cs_v6_release_guide.md

suite/cstest/include/test_detail_bpf.h

docs/cs_v6_release_guide.md

suite/cstest/include/test_detail_bpf.h

Roeegg2 · 2024-12-10T18:09:14Z

Any idea why the macOS tests fail now? I find why is_pkt is set to false

tests/details/bpf.yaml

Rot127

~~I think this is the issue with the test. Just a guess though.~~ My mistake, the error is with is_pkt. Not is_subtracted.

Guess you have to check with the debugger why it is set to true. But remove the is_signed: 0 line first or set it to -1. Maybe this is the issue for whatever reason.
I review a last time tomorrow.

XVilka · 2024-12-14T06:00:14Z

@kabeor

kabeor

Great, thanks for your contribution.

Roeegg2 added 2 commits December 3, 2024 19:20

Add missing new eBPF instructions

4adc79f

* Implement `movs`, `sdiv`, `div` instructions * Implement atomic ALU and complex operations (atomic `add`, `or`, `and`, `xor`, and `cmp`, `cmpxchg` respectively) * Remove outdated `xadd` tests, and add new ones for the new instructions

github-actions bot added BPF Arch python bindings labels Dec 3, 2024

Rot127 requested changes Dec 4, 2024

View reviewed changes

Roeegg2 force-pushed the bpf branch from 3e4bea7 to 68223f4 Compare December 4, 2024 22:23

github-actions bot added the CS-core-files auto-sync label Dec 4, 2024

Roeegg2 force-pushed the bpf branch 6 times, most recently from 7e5bbb7 to 75fa4a4 Compare December 6, 2024 09:55

github-actions bot added the Documentation label Dec 6, 2024

Roeegg2 force-pushed the bpf branch from a4f351a to edb371e Compare December 6, 2024 10:09

Rot127 requested changes Dec 7, 2024

View reviewed changes

Roeegg2 force-pushed the bpf branch 4 times, most recently from bda1cf3 to 163c545 Compare December 9, 2024 12:11

Rot127 requested changes Dec 10, 2024

View reviewed changes

Document BPF changes

2c87f8a

Roeegg2 force-pushed the bpf branch from 163c545 to d9b929f Compare December 10, 2024 17:53

Rot127 reviewed Dec 10, 2024

View reviewed changes

tests/details/bpf.yaml Outdated Show resolved Hide resolved

Rot127 reviewed Dec 10, 2024

View reviewed changes

Fix comments

d2e0e92

Roeegg2 force-pushed the bpf branch from d9b929f to d2e0e92 Compare December 10, 2024 21:52

Rot127 mentioned this pull request Dec 11, 2024

auto-sync progress tracker: Refactor and implement architectures #2015

Open

47 tasks

Correct BPF bindings error

7dfbe55

Rot127 approved these changes Dec 12, 2024

View reviewed changes

XVilka mentioned this pull request Dec 14, 2024

Add support for BPF disassembly rizinorg/rizin#4757

Open

kabeor approved these changes Dec 15, 2024

View reviewed changes

kabeor merged commit 812e654 into capstone-engine:next Dec 15, 2024
20 checks passed

Roeegg2 deleted the bpf branch December 16, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update BPF arch #2568

Update BPF arch #2568

Roeegg2 commented Dec 3, 2024

Rot127 left a comment •

edited

Loading

Rot127 commented Dec 4, 2024

Roeegg2 commented Dec 4, 2024

Rot127 left a comment

Rot127 left a comment

Roeegg2 commented Dec 10, 2024

Rot127 left a comment •

edited

Loading

XVilka commented Dec 14, 2024

kabeor left a comment

Update BPF arch #2568

Update BPF arch #2568

Conversation

Roeegg2 commented Dec 3, 2024

Rot127 left a comment • edited Loading

Choose a reason for hiding this comment

Rot127 commented Dec 4, 2024

Roeegg2 commented Dec 4, 2024

Rot127 left a comment

Choose a reason for hiding this comment

Rot127 left a comment

Choose a reason for hiding this comment

Roeegg2 commented Dec 10, 2024

Rot127 left a comment • edited Loading

Choose a reason for hiding this comment

XVilka commented Dec 14, 2024

kabeor left a comment

Choose a reason for hiding this comment

Rot127 left a comment •

edited

Loading

Rot127 left a comment •

edited

Loading