Skip to content

Conversation

@fntlnz
Copy link
Contributor

@fntlnz fntlnz commented Jan 14, 2021

Reading the buffer data from the iovec struct requires to go take a specific area in our buffer to perform the read into the cpu data->buf
map.

New kernels implement the __check_mem_access which takes care
of checking if the accessed memory is valid or not in the current
context.

In our case since our scratch buffer is split in a half with two null
terminators, we only need to read the area until the null terminator
happens or the verifier will alert.

Here's the verifier check 0 for this case:

    bool size_ok = size > 0 || (size == 0 && zero_size_allowed);
    struct bpf_reg_state *reg;

    if (off >= 0 && size_ok && (u64)off + size <= mem_size)
	    return 0;

Refs #1658

Co-Authored-By: Leonardo Di Donato leodidonato@gmail.com
Signed-off-by: Lorenzo Fontana lo@linux.com

reading readv/writev iovec struct

Raeding the buffer data from the iovec struct requires to go take a
specific area in our buffer to perform the read into the cpu data->buf
map.

New kernels implement the `__check_mem_access` which takes care
of checking if the accessed memory is valid or not in the current
context.

In our case since our scratch buffer is split in a half with two null
terminators, we only need to read the area until the null terminator
happens or the verifier will alert.

Here's the verifier check [0] for this case:

```
    bool size_ok = size > 0 || (size == 0 && zero_size_allowed);
    struct bpf_reg_state *reg;

    if (off >= 0 && size_ok && (u64)off + size <= mem_size)
	    return 0;
```

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/bpf/verifier.c?id=541c3bad8dc51b253ba8686d0cd7628e6b9b5f4c#n2552

Co-Authored-By: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Lorenzo Fontana <lo@linux.com>
@nathan-b
Copy link
Contributor

nathan-b commented Jan 14, 2021

I was having some trouble understanding exactly what effect this change might have, so I wrote a scratch program. I'm mapping the difference in read offset between the old code and the new code:

0:
 New: 0
 Old: 0
1:
 New: 1
 Old: 0
2:
 New: 2
 Old: 2
1000:
 New: 1000
 Old: 1000
2321:
 New: 2321
 Old: 2320
131069: (= SCRATCH_SIZE_HALF - 2)
 New: 131069
 Old: 131068
131070: (= SCRATCH_SIZE_HALF - 1)
 New: 131070
 Old: 131070
131071: (= SCRATCH_SIZE_HALF)
 New: 131071
 Old: 131070
131072: (= SCRATCH_SIZE_HALF + 1)
 New: 0
 Old: 0

My concern is that when off == 1, the old code will read from offset 0 but the new code will read from offset 1, which could have very real repercussions on the data.

And of course, it's not just 1. Any value of off which has the least significant bit set will now be masked differently.

Maybe I just don't understand the intent of this fix, but I'm worried it will cause the probe to start emitting incorrect data!

@fntlnz
Copy link
Contributor Author

fntlnz commented Jan 15, 2021

The key to understanding the change here is that data->buf is the destination map where the data from the kernel is copied. So the data buffer is not cut, we are just making sure that the buffer size does not exceed the scratch location when writing into it.

In any case, even if the source buffer was bigger, our architecture does not support reading more than SCRATCH_SIZE_HALF and, even if it could sinsp and scap have hard limits on what is brought into userspace and ultimately at the presentation layer.

@fntlnz
Copy link
Contributor Author

fntlnz commented Jan 15, 2021

Moreover, a bug in the offsets here would only affect readv and writev syscalls.

#ifdef BPF_FORBIDS_ZERO_ACCESS
if (to_read)
if (bpf_probe_read(&data->buf[off & SCRATCH_SIZE_HALF],
if (bpf_probe_read(&data->buf[off & SCRATCH_SIZE_HALF - 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses? Is this expression (off & SCRATCH_SIZE_HALF) - 1 or off & (SCRATCH_SIZE_HALF - 1)?

Copy link
Contributor Author

@fntlnz fntlnz Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- has precedence over & and I confirm that the generated bpf bytecode is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what you wrote above, is it possible that the intent of the code is (off & SCRATCH_SIZE_HALF) - 1?

@leodido
Copy link
Contributor

leodido commented Jan 15, 2021

Hold until we test this on older verifiers (ie., older kernels).

fntlnz added a commit that referenced this pull request Feb 3, 2021
relocations for reads.

This commit also contains the verifier fixes from #1733 and #1730
to get to a loadable state, will likely need to remove them before merging.

Signed-off-by: Lorenzo Fontana <lo@linux.com>
@fntlnz
Copy link
Contributor Author

fntlnz commented Feb 3, 2021

This can be closed, I will post updates on this fix later, debugged the kernel a bit more and understood why this is happening. Stay tuned.

@fntlnz fntlnz closed this Feb 3, 2021
@fntlnz fntlnz mentioned this pull request Feb 4, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants