-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BPF arch #2568
Update BPF arch #2568
Conversation
One of the main improvements recent eBPF verifiers have is the ability to jump backwards in code. This means that the OFF field of the opcode can be negative, and thus should be interpreted as signed. eBPF used to have load instructions, specific to loading data from socket buffer. Today, they are deprecated, but it is still good practice to support them, as it's very likely older programs might use them. Some of the newer instructions added by the specification require additional fields to dictate what is the exact operation to be performed (eg. CALL requires 'src' to decide whether 'imm' should interpreted as BTF function ID, or as offset from IP. MOV requires 'off' field in order to decide whether a regular MOV or rather a MOVSX should be performed) * Move BPF opcode setting to the disassembler * Change BPF JMP instruction naming to conventional JA * Fix the sign interpretation of 'off', 'imm', etc * Implement LDIND* and LDABS* legacy packet instructions * Implement JUMP32 instruction class * Move BPF opcode setting to disassembler * Correct and remove branch + load tests, and add more exhaustive ones * Correct and remove malformed LD tests, and add legacy packet tests
* Implement `movs`, `sdiv`, `div` instructions * Implement atomic ALU and complex operations (atomic `add`, `or`, `and`, `xor`, and `cmp`, `cmpxchg` respectively) * Remove outdated `xadd` tests, and add new ones for the new instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent job! Thanks a lot for all the effort!
I added quite some comments. Most of them are pretty simple.
Additionally to them, please:
- Add your copyright in SPDX style to the file headers. E.g.:
// SPDX-FileCopyrightText: 2023 Rot127 <unisono@quyllur.org> // SPDX-License-Identifier: BSD-3
- please run
clang-format -i arch/bpf/*
on the files.
Please add some tests to cover:
is_signed
andis_pkt
. You have to add the fields intest_detail_bpf.h/c
anddetails.py::test_expected_bpf()
.
(You can check what how theimm
is hanndled over all files).
Use be tested withcompare_tbool()
. In the yaml files you set the values with: 1 == true, 0 == unset, -1 == false.- The new groups you added.
I know this is tedious stuff, but doing it now, will save us from many bugs in the future.
I also added quite some comments for using the helper functions we already have.
This is important IMO, because those helper functions are better tested, have more asserts and working with them is easier for other devs.
Because one doesn't have to check how "this specific BPF function" works. Also single point of failure etc.
Regarding the name changes:
The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)
This is ok.
For the others. The problem is usually solved via enabling/disabling features (for x86 for example it is Intel or ATT syntax).
The problem with changing names is, people can't google the mnemonics anymore. But when they attempt to, one can't be sure if they found the correct definition of the instruction or deals with some weird undocumented edge case.
In general it is ok to change syntax for better readability. But please make it opt-in via a flag.
By default the syntax should always be as close as possible to the ISA (or RFC in this case).
You can add a CS_OPT_SYNTAX_BPF_CS
in capstone.h::cs_opt_value
and handle it in BPF_option()
.
Check the syntax flag in the printer with MI->csh->syntax & CS_OPT_SYNTAX_BPF_CS
and set the alternative mnemonic if true.
To extend cstool
, just add the option to cstool.c::all_opts
.
Ah, one last request. Please add a short description to the changes you made in |
It appears the formatting is wrong even though I ran |
7e5bbb7
to
75fa4a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the formatting is wrong even though I ran clang-format?
Most files are not formatted yet. But I try to keep the formatting diversion on an acceptable level. This is why I asked you to only format the bpf
files :)
Please revert the formatting in the others.
bda1cf3
to
163c545
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, almost done. Please just address the last few comments.
Any idea why the macOS tests fail now? I find why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the issue with the test. Just a guess though. My mistake, the error is with is_pkt
. Not is_subtracted
.
Guess you have to check with the debugger why it is set to true. But remove the is_signed: 0
line first or set it to -1
. Maybe this is the issue for whatever reason.
I review a last time tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for your contribution.
Your checklist for this pull request
Detailed description
Full details in commit messages.
Resources used:
In case of inconsistencies I followed the resources in the order listed
Some important notes:
movs32/movs
(for any s8,s16,s32),movsx\movsx64
was used, wherex
is the size of the moveacmp
, we useacmpxchg
. It's misleading naming itacmp
, since this instruction also swaps the data if the comparison is correct.na/64
instead of32/na
is simply because that's what is used so far. Doesn't really matter IMO, (and anyway it makes more sense, since32
bit is the default instruction class, and64
are the eBPF added ones.)llvm-objdump
and the Linux dissembler showed (GNU objdump doesn't parse it correctly)Test plan
All tests green.
...
Closing issues
...