Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large libbpf messages are truncated #300

Closed
javierhonduco opened this issue Mar 15, 2023 · 6 comments · Fixed by #301
Closed

Large libbpf messages are truncated #300

javierhonduco opened this issue Mar 15, 2023 · 6 comments · Fixed by #301
Assignees
Labels
bug Something isn't working

Comments

@javierhonduco
Copy link
Contributor

javierhonduco commented Mar 15, 2023

Hi,

Recently we've seen that long verifier messages that used to be displayed in their entirety are now truncated. This unfortunately hinders debugging verifier problems. An example in https://github.com/javierhonduco/parca-agent/tree/truncated-logs-repro, built and ran with:

$ make && make debug/build && sudo dist/parca-agent-debug --node=test --log-level="debug"  --verbose-bpf-logging

fails with these truncated logs

libbpf: prog 'walk_user_stacktrace_impl': BPF program load failed: Invalid argument
libbpf: prog 'walk_user_stacktrace_impl': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; int walk_user_stacktrace_impl(struct bpf_perf_event_data *ctx) {
0: (bf) r9 = r1                       ; R1=ctx(off=0,imm=0) R9_w=ctx(off=0,imm=0)
; u64 pid_tgid = bpf_get_current_pid_tgid();
1: (85
libbpf: prog 'walk_user_stacktrace_impl': failed to load: -22
libbpf: failed to load object 'parca'

I believe this behaviour was introduced in e3f69db. The buffer is now 300 chars (including nul).

out = (char *)calloc(1, 300);

>>> len("libbpf: prog 'walk_user_stacktrace_impl': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; int walk_user_stacktrace_impl(struct bpf_perf_event_data *ctx) { 0: (bf) r9 = r1                       ; R1=ctx(off=0,imm=0) R9_w=ctx(off=0,imm=0) ; u64 pid_tgid = bpf_get_current_pid_tgid(); 1: (85")
299

cc/ @kakkoyun @Sylfrena @brancz

@geyslan
Copy link
Member

geyslan commented Mar 15, 2023

@javierhonduco thanks for bringing that. Can you please test increasing that buffer from your end?

@javierhonduco
Copy link
Contributor Author

javierhonduco commented Mar 15, 2023

With a buffer size of 1000:

-  out = (char *)calloc(1, 300);
+  out = (char *)calloc(1, 1000);
  if (!out)
    return -ENOMEM;

  va_copy(check, args);
-  ret = vsnprintf(out, 300, format, check);
+  ret = vsnprintf(out, 1000, format, check);
  va_end(check);

we get

libbpf: prog 'profile_cpu': relo #11: patched insn #387 (LDX/ST/STX) off 128 -> 128
libbpf: prog 'walk_user_stacktrace_impl': BPF program load failed: Invalid argument
libbpf: prog 'walk_user_stacktrace_impl': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; int walk_user_stacktrace_impl(struct bpf_perf_event_data *ctx) {
0: (bf) r9 = r1                       ; R1=ctx(off=0,imm=0) R9_w=ctx(off=0,imm=0)
; u64 pid_tgid = bpf_get_current_pid_tgid();
1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
2: (bf) r7 = r0                       ; R0_w=scalar(id=1) R7_w=scalar(id=1)
3: (b7) r1 = 0                        ; R1_w=0
; u64 zero = 0;
4: (7b) *(u64 *)(r10 -48) = r1        ; R1_w=P0 R10=fp0 fp-48_w=00000000
5: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
; 
6: (07) r2 += -48                     ; R2_w=fp-48
; unwind_state_t *unwind_state = bpf_map_lookup_elem(&heap, &zero);
7: (18) r1 = 0xffff98dabd8e0400       ; R1_w=map_ptr(off=0,ks=4,vs=1056,imm=0)
9: (85) call bpf_map_lookup_elem#1    ; R0=map_value_or_null(id=2,off=0,ks=4,vs=1056,imm=0)
10: (bf) r8 = r0                      ; R0=map_value_or_null(id=2,off=0,ks=4,vs=10
libbpf: prog 'walk_user_stacktrace_impl': failed to load: -22
libbpf: failed to load object 'parca'

Which is 998 chars + nul for the BPF verifier errors.

Our program is rather large so the message can be many thousands of lines long, so dynamically allocating memory for the whole buffer might not be possible/practical

@geyslan
Copy link
Member

geyslan commented Mar 15, 2023

Well, I suppose we can make two calls to vsnprintf, being the first to get the buffer required size:

int required_size = vsnprintf(NULL, 0, format, check) + 1;

and later:

  out = (char*)malloc(required_size);
  ...
  ret = vsnprintf(out, required_size, format, args);

@geyslan
Copy link
Member

geyslan commented Mar 19, 2023

@javierhonduco Please go get ...@commit and let me know if this is fixed on your end. Thanks.

@javierhonduco
Copy link
Contributor Author

@geyslan Testing it, thanks! Right off the bat, I am a bit afraid of potentially allocating a very large buffer and having the size of the buffer depend on an external source (even if we decide to "trust" the kernel!)

@geyslan
Copy link
Member

geyslan commented Mar 20, 2023

@geyslan Testing it, thanks! Right off the bat, I am a bit afraid of potentially allocating a very large buffer and having the size of the buffer depend on an external source (even if we decide to "trust" the kernel!)

I get your concern, we are just doing this copy to expose the filtering mechanism in an easier way (via golang). Since we can't truncate the original logs, what do you suggest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants