Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Compression module memory corruption #2602

Closed
john08burke opened this issue Aug 17, 2021 · 11 comments · May be fixed by #2632
Closed

[CRASH] Compression module memory corruption #2602

john08burke opened this issue Aug 17, 2021 · 11 comments · May be fixed by #2632
Labels

Comments

@john08burke
Copy link
Contributor

john08burke commented Aug 17, 2021

OpenSIPS version you are running

version: opensips 3.1.3 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
main.c compiled on 19:32:34 Aug 12 2021 with gcc 8

This is with nightly 3.1 (right before 3.1.4 was tagged). I have been chasing a segfault for quite some time now in our production environment and haven't been able to reproduce in our lab / dev environments. The crash is very intermittent (a month between crashes) but always triggers a similar set of logs:

[120] CRITICAL:core:build_res_buf_from_sip_res: 
>>> len mismatch : calculated 541, written 569

It seems you have hit a programming bug.
Please help us make OpenSIPS better by reporting it at https://github.com/OpenSIPS/opensips/issues

Jun 24 14:39:39 [117] CRITICAL:core:fm_status: different free frag. count: 10!=9 for hash  33
Jun 24 14:39:39 [117] CRITICAL:core:fm_status: different free frag. count: 0!=1 for hash  54
Jun 24 14:39:39 [118] CRITICAL:core:fm_status: different free frag. count: 4!=3 for hash  33
Jun 24 14:39:39 [118] CRITICAL:core:fm_status: different free frag. count: 1!=2 for hash  53
Jun 24 14:39:39 [116] CRITICAL:core:fm_status: different free frag. count: 4!=3 for hash  33
Jun 24 14:39:39 [116] CRITICAL:core:fm_status: different free frag. count: 42!=43 for hash  49

The logs seem to indicate some memory corruption, so I ran with Q_MALLOC + DBG and within a few calls OpenSIPS crashed with SIGABRT and seems the allocator detected memory issue with the compression module:

CRITICAL:core:qm_debug_frag:  qm_*: prev. fragm. tail overwritten(c0c0c0c0c0c0c00a, abcdefedabcdefed)[0x7f75c2e64068:0x7f75c2e64098] (wrap_realloc, compression_helpers.c:374)!

We only use the mc_compact method, so it seems that the mc_compact_cb function is likely the culprit.

Here is the core dump that was produced (purged of sensitive info). I can provide the full dump if needed via email.

Let me know if you need any further info!

To Reproduce
The crash is random and I haven't been able to reproduce.

OS/environment information

  • Operating System: debian 10
  • OpenSIPS installation: build from nightly 3.1
  • other relevant information:
@john08burke
Copy link
Contributor Author

The overwritten tail from the dbg logs is c0c0c0c0c0c0c0c0 != c0c0c0c0c0c0c00a... which seems to indicate that the compacted message buffer from mc_compact_cb is causing the LF to overflow.

From frame 2 of the backtrace we have the fragment of issue
#2 0x0000563098f42bc2 in qm_debug_frag (f=f@entry=0x7f75c2e64068, qm=<optimized out>) at mem/q_malloc.c:122

Is possible to print the contents of this fragment? I was unsuccessful, but maybe this will shed some light into the unexpected buffer.

Any other ideas for debugging the issue?

@github-actions
Copy link

github-actions bot commented Sep 3, 2021

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@john08burke
Copy link
Contributor Author

Hey team! This one was tricky to identify, but is related to miscalculated buffer length due to header sanitization. If you apply the first commit from the PR that I just submitted (a5b6d2e) and send a request with some non-compacted header that will be sanitized when parsed then you will see the new LM_BUG report an overflow.

Example:

# set header with extra leading whitespace
Min-SE:  3600

Sep 22 00:40:21 [118] CRITICAL:compression:mc_compact_cb: 
>>> buffer overflow: calculated=1357, actual=1358

It seems you have hit a programming bug.
Please help us make OpenSIPS better by reporting it at https://github.com/OpenSIPS/opensips/issues

In the PR that I submitted, I removed the shortcutting on header assembly. Let me know if you have a better solution!

@stale stale bot removed the stale label Sep 22, 2021
@bogdan-iancu
Copy link
Member

Hi @john08burke , thanks for your work here !! Are you talking about #2632 , right ?

@john08burke
Copy link
Contributor Author

Hey @bogdan-iancu, yes that's the one! Let me know if you need more info to reproduce. You should be able to trigger the overflow log if you cherry-pick a5b6d2e. Ideally Q_MALLOC+DBG should catch it too, but in my test env I could never get it to dump!

@john08burke
Copy link
Contributor Author

I also realized that I detailed the bug in my commit (ba89136), but didn't really put the info here! The cause of the overflow seems to be with the shortcutting of non-compact SIP headers when re-assembling after compaction, especially in case when "sanitization" was performed. The calculation for memory allocation is always done with the hdr_field.body field which removes leading white spaces (and probably more cleanups, but I didn't check). However, when re-assembling the compact headers the module uses the hdr_field.name field which is the raw header. Normally this is OK, but in case of sanitization this leads to buffer overflow.

For example, something like:

Min-SE:  3600

where there is an extra space will lead to allocation len being 1 byte less than what will be written to buffer!

@github-actions
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Oct 23, 2021
@john08burke
Copy link
Contributor Author

keepalive, I can provide an env to replicate this if that would help!

@stale stale bot removed the stale label Oct 24, 2021
@john08burke
Copy link
Contributor Author

Hey @bogdan-iancu! Any chance we can try to get this one reviewed and closed out? I've been running with the submitted patch for ~2months and haven't seen the crash since.

@stale
Copy link

stale bot commented Jan 9, 2022

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@stale stale bot added the stale label Jan 9, 2022
@stale
Copy link

stale bot commented Apr 19, 2022

Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.

@stale stale bot closed this as completed Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants