Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BPF arch #2568

Merged
merged 5 commits into from
Dec 15, 2024
Merged

Update BPF arch #2568

merged 5 commits into from
Dec 15, 2024

Conversation

Roeegg2
Copy link
Contributor

@Roeegg2 Roeegg2 commented Dec 3, 2024

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

Full details in commit messages.

Resources used:

  1. ref1
  2. ref2
  3. LLVM objdump
  4. Linux BPF disassembler
  5. GNU binutils objdump (has quite a few errors. Only used for syntax reference)

In case of inconsistencies I followed the resources in the order listed

Some important notes:

  • In general I followed the syntax GNU objdump has (since this is what has been mostly used thus far) but there were cases in which I felt like the syntax was misleading or inaccurate so I modified it a bit:
  1. Instead of movs32/movs (for any s8,s16,s32), movsx\movsx64 was used, where x is the size of the move
  2. Instead of acmp, we use acmpxchg. It's misleading naming it acmp, since this instruction also swaps the data if the comparison is correct.
  3. The reason for using na/64 instead of 32/na is simply because that's what is used so far. Doesn't really matter IMO, (and anyway it makes more sense, since 32 bit is the default instruction class, and 64 are the eBPF added ones.)
  • The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)

Test plan

All tests green.
...

Closing issues

...

One of the main improvements recent eBPF verifiers have is the ability to jump backwards in code.
This means that the OFF field of the opcode can be negative, and thus should be interpreted as signed.

eBPF used to have load instructions, specific to loading data from
socket buffer. Today, they are deprecated, but it is still good practice
to support them, as it's very likely older programs might use them.

Some of the newer instructions added by the specification require additional fields to dictate
what is the exact operation to be performed (eg. CALL requires 'src' to decide whether 'imm' should interpreted as BTF function ID,
or as offset from IP. MOV requires 'off' field in order to decide whether a regular MOV or rather a MOVSX should be performed)

* Move BPF opcode setting to the disassembler
* Change BPF JMP instruction naming to conventional JA
* Fix the sign interpretation of 'off', 'imm', etc
* Implement LDIND* and LDABS* legacy packet instructions
* Implement JUMP32 instruction class
* Move BPF opcode setting to disassembler
* Correct and remove branch + load tests, and add more exhaustive ones
* Correct and remove malformed LD tests, and add legacy packet tests
* Implement `movs`, `sdiv`, `div` instructions
* Implement atomic ALU and complex operations (atomic `add`,
  `or`, `and`, `xor`, and `cmp`, `cmpxchg` respectively)
* Remove outdated `xadd` tests, and add new ones for the new
  instructions
@github-actions github-actions bot added BPF Arch python bindings labels Dec 3, 2024
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job! Thanks a lot for all the effort!

I added quite some comments. Most of them are pretty simple.

Additionally to them, please:

  • Add your copyright in SPDX style to the file headers. E.g.:
    // SPDX-FileCopyrightText: 2023 Rot127 <unisono@quyllur.org>
    // SPDX-License-Identifier: BSD-3
    
  • please run clang-format -i arch/bpf/* on the files.

Please add some tests to cover:

  • is_signed and is_pkt. You have to add the fields in test_detail_bpf.h/c and details.py::test_expected_bpf().
    (You can check what how the imm is hanndled over all files).
    Use be tested with compare_tbool(). In the yaml files you set the values with: 1 == true, 0 == unset, -1 == false.
  • The new groups you added.

I know this is tedious stuff, but doing it now, will save us from many bugs in the future.

I also added quite some comments for using the helper functions we already have.
This is important IMO, because those helper functions are better tested, have more asserts and working with them is easier for other devs.
Because one doesn't have to check how "this specific BPF function" works. Also single point of failure etc.

Regarding the name changes:

The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)

This is ok.

For the others. The problem is usually solved via enabling/disabling features (for x86 for example it is Intel or ATT syntax).

The problem with changing names is, people can't google the mnemonics anymore. But when they attempt to, one can't be sure if they found the correct definition of the instruction or deals with some weird undocumented edge case.
In general it is ok to change syntax for better readability. But please make it opt-in via a flag.
By default the syntax should always be as close as possible to the ISA (or RFC in this case).

You can add a CS_OPT_SYNTAX_BPF_CS in capstone.h::cs_opt_value and handle it in BPF_option().
Check the syntax flag in the printer with MI->csh->syntax & CS_OPT_SYNTAX_BPF_CS and set the alternative mnemonic if true.

To extend cstool, just add the option to cstool.c::all_opts.

arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFInstPrinter.c Outdated Show resolved Hide resolved
include/capstone/bpf.h Outdated Show resolved Hide resolved
tests/MC/BPF/extended-all.yaml Show resolved Hide resolved
tests/MC/BPF/extended-all.yaml Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
@Rot127
Copy link
Collaborator

Rot127 commented Dec 4, 2024

Ah, one last request. Please add a short description to the changes you made in docs/cs_v6_release_guide.md::New features. Also document the mnemonic change for the LLVM style load instructions there.

@Roeegg2
Copy link
Contributor Author

Roeegg2 commented Dec 4, 2024

It appears the formatting is wrong even though I ran clang-format?

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the formatting is wrong even though I ran clang-format?

Most files are not formatted yet. But I try to keep the formatting diversion on an acceptable level. This is why I asked you to only format the bpf files :)

Please revert the formatting in the others.

arch/BPF/BPFDisassembler.c Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Show resolved Hide resolved
arch/BPF/BPFDisassembler.c Outdated Show resolved Hide resolved
suite/cstest/include/test_detail_bpf.h Outdated Show resolved Hide resolved
suite/cstest/include/test_detail_bpf.h Outdated Show resolved Hide resolved
suite/cstest/include/test_detail_bpf.h Outdated Show resolved Hide resolved
tests/details/bpf.yaml Show resolved Hide resolved
suite/cstest/src/test_detail_bpf.c Outdated Show resolved Hide resolved
@Roeegg2 Roeegg2 force-pushed the bpf branch 4 times, most recently from bda1cf3 to 163c545 Compare December 9, 2024 12:11
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, almost done. Please just address the last few comments.

arch/BPF/BPFDisassembler.c Show resolved Hide resolved
arch/BPF/BPFMapping.c Show resolved Hide resolved
docs/cs_v6_release_guide.md Outdated Show resolved Hide resolved
suite/cstest/include/test_detail_bpf.h Outdated Show resolved Hide resolved
docs/cs_v6_release_guide.md Outdated Show resolved Hide resolved
suite/cstest/include/test_detail_bpf.h Outdated Show resolved Hide resolved
@Roeegg2
Copy link
Contributor Author

Roeegg2 commented Dec 10, 2024

Any idea why the macOS tests fail now? I find why is_pkt is set to false

tests/details/bpf.yaml Outdated Show resolved Hide resolved
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the issue with the test. Just a guess though. My mistake, the error is with is_pkt. Not is_subtracted.

Guess you have to check with the debugger why it is set to true. But remove the is_signed: 0 line first or set it to -1. Maybe this is the issue for whatever reason.
I review a last time tomorrow.

@XVilka
Copy link
Contributor

XVilka commented Dec 14, 2024

@kabeor

Copy link
Member

@kabeor kabeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for your contribution.

@kabeor kabeor merged commit 812e654 into capstone-engine:next Dec 15, 2024
20 checks passed
@Roeegg2 Roeegg2 deleted the bpf branch December 16, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants