-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ovl_create_or_link inlining leads to false positive 'off' flag corruption for dockerd #215
Comments
Thank you for reporting this, @ajakk! Is this issue new with 0.9.4 or does it also occur with 0.9.3? Either way, can you please try uncommenting |
Looks like the reported flag value is exactly 9x the normalization multiplier:
This is close to the maximum Another possibility is we unexpectedly see this in Anyway, @ajakk can you try the below patch and see what happens? - @@ -970,7 +970,7 @@ inline void p_validate_off_flag(struct p_ed_process *p_source, long p_val, int *
while (p_val > p_global_cnt_cookie) {
p_val -= p_global_cnt_cookie;
- if (unlikely(p_val > (p_global_cnt_cookie << 3)))
+ if (unlikely(p_val > (p_global_cnt_cookie << 8)))
break;
}
This isn't a full/proper fix anyhow. If we do need to support |
Sorry, that very patch won't even work because of the integer wraparound I mentioned. The maximum that would work reliably (regardless of what the random values are) is diff --git a/src/modules/exploit_detection/p_exploit_detection.c b/src/modules/exploit_detection/p_exploit_detection.c
index f0e987d..9907018 100644
--- a/src/modules/exploit_detection/p_exploit_detection.c
+++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -970,7 +970,7 @@ inline void p_validate_off_flag(struct p_ed_process *p_source, long p_val, int *
while (p_val > p_global_cnt_cookie) {
p_val -= p_global_cnt_cookie;
- if (unlikely(p_val > (p_global_cnt_cookie << 3)))
+ if (unlikely(p_val > (p_global_cnt_cookie << 8)))
break;
}
diff --git a/src/modules/exploit_detection/p_exploit_detection.h b/src/modules/exploit_detection/p_exploit_detection.h
index 21e3837..97927b4 100644
--- a/src/modules/exploit_detection/p_exploit_detection.h
+++ b/src/modules/exploit_detection/p_exploit_detection.h
@@ -239,10 +239,10 @@ struct p_seccomp {
#ifdef CONFIG_X86_64
#define P_NORMALIZE_LONG 0x0101010101010101
- #define P_MASK_COUNTER 0x07FFFFFFFFFFFFFF
+ #define P_MASK_COUNTER 0x003FFFFFFFFFFFFF
#else
#define P_NORMALIZE_LONG 0x01010101
- #define P_MASK_COUNTER 0x07FFFFFF
+ #define P_MASK_COUNTER 0x003FFFFF
#endif
#ifdef P_LKRG_TASK_OFF_DEBUG |
I believe this has been an issue since I started using lkrg, around 0.9.2. So, certainly not a recent regression. The first patch works! The second patch seems to be an improvement, but it produces inconsistent results:
|
That's weird. I'd suspect that if the second patch doesn't consistently fix the issue, the first patch would even more likely/frequently not do so. Maybe you need to unload/reload LKRG with the first patch a few times to trigger it failing too. The random values used in When you get that Would you please also try |
So, with this patch:
I've reloaded lkrg a few times, and run |
This time we have:
and it's 13x:
I really don't see why the loop in Further debugging output suggests this involved There were also stack traces apparently omitted from the log because of grepping for |
Sorry, didn't realize! This is generated with |
Thanks. Can you try increasing The overrides look mostly balanced (interleaved on/off), but not fully, and perhaps this adds up... |
At the same time, you can revert to default |
@ajakk would you be able to answer a few questions?
|
@Adam-pi3 Let's also hear from @ajakk, but meanwhile I can answer some of these: The kernel is Per the logs the That commit you reference is very recent. @ajakk had already answered my similar question as follows:
|
@ajakk Adam's questions reminded me, do you by any chance get this message triggered? - if (p_install_ovl_create_or_link_hook(1)) {
p_print_log(P_LOG_FAULT,
"OverlayFS is being loaded but LKRG can't hook 'ovl_create_or_link' function. "
"It is very likely that LKRG will produce false positives. Please reload LKRG.");
} If you do, please try to reload LKRG as it says and see if the problem persists. |
Also, upon reloading do you possibly get the message from here? - /* OverlayFS
*
* OverlayFS might not be installed in that system - it is not critical
* scenario. If OverlayFS is installed, used but not found (unlikely)
* in worst case, we might have FP. Continue...
*/
{ "ovl_create_or_link",
p_install_ovl_create_or_link_hook,
p_uninstall_ovl_create_or_link_hook,
0,
"Can't hook 'ovl_create_or_link' function. This is expected when OverlayFS is not used.",
1
}, Despite of the If so, that would indicate that aggressive optimizations of the kernel prevented |
Linux sol 5.15.52-gentoo-dist-hardened #1 SMP Fri Jul 8 13:10:28 CDT 2022 x86_64 AMD Ryzen 7 5800X 8-Core Processor AuthenticAMD GNU/Linux I've relatively recently switched back down to upstream-LTS from upstream-stable kernels, though.
This is Gentoo's https://github.com/mgorny/gentoo-kernel-config/blob/master/hardened-base.config Additionally, I've done some mild meddling in
For good measure, I've attached my config.gz.
Nothing that I'm aware of.
I'm unsure, but I don't think I've done any meddling related to overlayfs.
Not a 0.9.4 regression, so I'm not sure this would be informative.
This is what I (reproducibly) see on reloading lkrg with a
|
@Adam-pi3
|
Can you also check |
Unrelated to this issue, but that setting is otherwise problematic with LKRG, see #79. |
Sorry, here's that too (with kernel logs from the start of lkrg's first ALERT):
|
Just a few notes, I've just checked the recent Ubuntu LTS 22.04:
@ajakk Thanks for all the information, the problem is likely related to the following line:
@solardiz We would need to do the homework to verify what can or cannot be hooked. Be aware that 'static' can be added anyway to other functions which we choose (it happens all the time).
|
|
Sorry, I was wrong about using default |
@ajakk Can you do a test where you modify the following file in the linux kernel sources? Execute the following command:
next, recompile |
@Adam-pi3 I think that test could be rather time-consuming for @ajakk if he's not used to that on this system (uses a distro kernel). Maybe it's better use of everyone's time to ask @ajakk to experiment with a possible LKRG fix instead. Looking at the code, I think simply hooking |
@solardiz From the top of my head I don't even remember why this specific function must be hooked to double allign the flag. From my understanding somewhere down-the path, there is 'forgotten' call to the 'revert' or called it twice. I would like to first find out what was exactly the case, if nothing has been changed in the kernel since we developed this hook and the we can look for possible solutions. Sure we can do this 'blind' check/fix but we would not know side effects. |
@ajakk Can you try this LKRG hack, please? - diff --git a/src/modules/exploit_detection/p_exploit_detection.c b/src/modules/exploit_detection/p_exploit_detection.c
index f0e987d..fb462bd 100644
--- a/src/modules/exploit_detection/p_exploit_detection.c
+++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -304,7 +304,7 @@ static const struct p_functions_hooks {
* scenario. If OverlayFS is installed, used but not found (unlikely)
* in worst case, we might have FP. Continue...
*/
- { "ovl_create_or_link",
+ { "security_dentry_create_files_as",
p_install_ovl_create_or_link_hook,
p_uninstall_ovl_create_or_link_hook,
0,
diff --git a/src/modules/exploit_detection/syscalls/override/overlayfs/p_ovl_create_or_link/p_ovl_create_or_link.c b/src/modules/exp
loit_detection/syscalls/override/overlayfs/p_ovl_create_or_link/p_ovl_create_or_link.c
index bb4767e..5f64625 100644
--- a/src/modules/exploit_detection/syscalls/override/overlayfs/p_ovl_create_or_link/p_ovl_create_or_link.c
+++ b/src/modules/exploit_detection/syscalls/override/overlayfs/p_ovl_create_or_link/p_ovl_create_or_link.c
@@ -25,7 +25,7 @@
char p_ovl_create_or_link_kretprobe_state = 0;
static struct kretprobe p_ovl_create_or_link_kretprobe = {
- .kp.symbol_name = "ovl_create_or_link",
+ .kp.symbol_name = "security_dentry_create_files_as",
.handler = p_ovl_create_or_link_ret,
.entry_handler = p_ovl_create_or_link_entry,
.data_size = sizeof(struct p_ovl_create_or_link_data),
@@ -37,7 +37,7 @@ static struct kretprobe p_ovl_create_or_link_kretprobe = {
void p_reinit_ovl_create_or_link_kretprobe(void) {
memset(&p_ovl_create_or_link_kretprobe,0x0,sizeof(struct kretprobe));
- p_ovl_create_or_link_kretprobe.kp.symbol_name = "ovl_create_or_link";
+ p_ovl_create_or_link_kretprobe.kp.symbol_name = "security_dentry_create_files_as";
p_ovl_create_or_link_kretprobe.handler = p_ovl_create_or_link_ret;
p_ovl_create_or_link_kretprobe.entry_handler = p_ovl_create_or_link_entry;
p_ovl_create_or_link_kretprobe.data_size = sizeof(struct p_ovl_create_or_link_data); |
@Adam-pi3 Sure we need fresh understanding of the underlying issue before fixing it for real.
There are many other |
BTW, that the call to |
@ajakk In fact, we can try different approach. Would you be able to run the following command? Example of output
If you see the same function as being visible in your kernel (
|
@Adam-pi3 Yes, let's try that. However, don't we even more importantly need to update the hooked function name in |
@solardiz this is just a temp patch to verify if @ajakk environment will be correctly handling it. We do not change any function names here, just inside of the function body we changed the logic and name of the hooked function (but still using old LKRG's function name). |
It's not too terribly time consuming, there's a binary distribution kernel package ( In any case, the latest patch seems to work! Happy to test the kernel patch too, if necessary. LKRG doesn't produce any log messages if I try to run Docker containers, and the only messages I see when lkrg stops/starts are:
|
@Adam-pi3 Sure. I wrongly thought we actually used function names from the Great news that the patch worked. Meanwhile, I was thinking - maybe this issue is a reminder for us to reconsider and instead of insisting that |
@solardiz I look at this problem a bit opposite. Instead of 'weakening' |
Summary of evaluation:
|
Thanks for your analysis, @Adam-pi3. I've confirmed some of this (looking at those kernels' code) and didn't find any discrepancies with your description. Meanwhile, I got the below thoughts:
I don't mind implementing a fix like Adam seems to have suggested above for now, but longer-term we really might want to reconsider whether we possibly incur a lot of complexity and kernel incompatibility risks for too little gain in protection. |
I guess you meant between 4.7 and 4.9? |
Yes, 4.7 until 4.9 |
The original logic was hooking 'ovl_create_or_link' function but it could be inlined. This commit changes it by hooking 'ovl_dentry_is_whiteout' when possible. Fixes #215
@ajakk We've just merged Adam's fix for this issue. We'd appreciate you testing. I might make a follow-up PR cleaning up this fix, so you might want to test before and/or after that. Thanks! |
I'm running lkrg-0.9.4 on Gentoo. Docker is at 20.10.16.
This seems to be the interesting part of dmesg, with log level set to 6:
The text was updated successfully, but these errors were encountered: