Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmantation fault for very big data on 32-bit linux build #218

Closed
synacker opened this issue Apr 2, 2020 · 34 comments
Closed

Segmantation fault for very big data on 32-bit linux build #218

synacker opened this issue Apr 2, 2020 · 34 comments

Comments

@synacker
Copy link

synacker commented Apr 2, 2020

I have segmentation error only for Release build, when I tired to work with very big data (about 1GB arrays) on 32 bit linux os (mimalloc 1.6.1).
I build app with additional debug info, and there is next core dump:

received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xe358bb40 (LWP 16919)] mi_segment_init (segment=0x94400000, segment@entry=0x0, required=required@entry=0, page_kind=page_kind@entry=MI_PAGE_SMALL, page_shift=15, tld=0xd856b644, os_tld=0xd856b684) at /home/sies_unit/.conan/data/mimalloc/1.6.1/sies/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:635 635 /home/sies_unit/.conan/data/mimalloc/1.6.1/sies/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c: Нет такого файла или каталога. (gdb) bt -full #0 mi_segment_init (segment=0x94400000, segment@entry=0x0, required=required@entry=0, page_kind=page_kind@entry=MI_PAGE_SMALL, page_shift=15, tld=0xd856b644, os_tld=0xd856b684) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:635 memid = 1088 mem_large = false capacity = 64 info_size = 4096 pre_size = 8192 segment_size = <optimized out> eager_delayed = <optimized out> eager = <optimized out> commit = true pages_still_good = false is_zero = true #1 0xf7fb10b2 in mi_segment_alloc (os_tld=0xd856b684, tld=0xd856b644, page_shift=15, page_kind=MI_PAGE_SMALL, required=0) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:682 No locals. #2 mi_segment_reclaim_or_alloc (heap=0xd856b000, block_size=8192, page_kind=MI_PAGE_SMALL, page_shift=15, tld=<optimized out>, os_tld=0xd856b684) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:1201 segment = 0x0 reclaimed = false page_shift = 15 heap = 0xd856b000 os_tld = 0xd856b684 tld = 0xd856b644 page_kind = MI_PAGE_SMALL block_size = 8192 segment = 0x0 reclaimed = <optimized out> #3 0xf7fb12ff in mi_segment_page_alloc (heap=heap@entry=0xd856b000, block_size=block_size@entry=8192, kind=kind@entry=MI_PAGE_SMALL, page_shift=15, tld=0xd856b644, os_tld=0xd856b684) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:1234 segment = <optimized out> free_queue = <optimized out> page = <optimized out> #4 0xf7fb1745 in mi_segment_small_page_alloc (os_tld=0xd856b684, tld=0xd856b644, block_size=8192, heap=0xd856b000) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:1251 No locals. #5 _mi_segment_page_alloc (heap=0xd856b000, block_size=8192, tld=0xd856b644, os_tld=0xd856b684) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/segment.c:1321 page = <optimized out> #6 0xf7fb1c1b in mi_page_fresh_alloc (heap=heap@entry=0xd856b000, pq=pq@entry=0xd856b3e8, block_size=8192) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/page.c:247 page = <optimized out> #7 0xf7fb31ca in mi_page_fresh (pq=0xd856b3e8, heap=0xd856b000) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/page.c:264 page = <optimized out> page = <optimized out> #8 mi_page_queue_find_free_ex (heap=0xd856b000, pq=0xd856b3e8, first_try=<optimized out>) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/page.c:679 count = <optimized out> page = 0x0 #9 0xf7fb33c2 in mi_find_free_page (size=8192, heap=0xd856b000) at /home/my_app/.conan/data/mimalloc/1.6.1/project/testing/build/a98ffa6b232ae76f738ec2524a9fc9ba058249ac/source_subfolder/src/page.c:713 pq = 0xd856b3e8 page = 0x9f400c0c pq = <optimized out>

As you can see, the problem, that allocated segment pointer referenced to corrupted data.
In Debug build there is no problem, therefore, I think, that problem in padding of allocated memory.

I tried to locolize problem, but still cann't to do it. From other side, I cann't share my production code.

I will continue try to catch this problem by unit test, but hope, this information will provide any ideas about this segmentation.

Thank you for attention!

@daanx
Copy link
Collaborator

daanx commented Apr 6, 2020

Ah that is not good :-( Thank you for the detailed trace! It is a very strange place to go wrong as it just allocated a new segment from the OS and then it segfaults writing to it. Not sure how this can happen. Can you try to repro with the latest dev branch -- it contains some fixes related to padding and perhaps it fixes it?

@synacker
Copy link
Author

synacker commented Apr 6, 2020

@daanx thank you for attention! I will check the dev changes tommorow.

@daanx
Copy link
Collaborator

daanx commented Apr 6, 2020

This may be related to issue #221 so I hope that commit will fix your troubles too.

@synacker
Copy link
Author

synacker commented Apr 7, 2020

I tried the last dev changes - the result is the same.
Note, that problem is not exist with debug build.

@synacker
Copy link
Author

synacker commented Apr 7, 2020

Intersting thing, the bad segment address is 0x94400000 always

@daanx
Copy link
Collaborator

daanx commented Apr 7, 2020

Ah, very strange. I need more information; what is the OS you are using? Also, does this happen on 64-bit as well? Is it statically or dynamically linked?

  • If you build without debugging but with MI_SECURE=ON (on cmake) does it still happen?
  • Also, can you build with debug mode but edit the mimalloc-types.h file to remove
    #if !defined(MI_PADDING) && (MI_DEBUG>=1)
    #define MI_PADDING  1
    #endif
    
    so it gets build without padding in debug mode. That will show us if it is related to padding.

Also, 0x94400000 is a rather high address for a 32-bit adress space... when I google it various drivers and OS's seem to reserve it? Are you reserving a lot of memory? I wonder if we didn't check an error condition for out-of-memory correctly.

@synacker
Copy link
Author

synacker commented Apr 7, 2020

  1. OS 32 bit debian 9. I can share docker container for building.
  2. Ok, I will try to disable secure and remove MI_PADDING in debug
  3. I allocated very big data, approximetly nearest to the max 4 GB size.

@daanx
Copy link
Collaborator

daanx commented Apr 7, 2020

FYI, I just pushed a commit to dev that allows you to disable the padding in cmake using the -DMI_PADDING=OFF flag to cmake.

@synacker
Copy link
Author

synacker commented Apr 7, 2020

Thank you, I will try it

@synacker
Copy link
Author

synacker commented Apr 9, 2020

Tried build without padding - the result is the same and, as always, the bad segment has 0x94400000 address
mi_segment_init (segment=0x94400000, segment@entry=0x0, required=required@entry=0, page_kind=page_kind@entry=MI_PAGE_SMALL, page_shift=15, tld=0xd857264c, os_tld=0xd857268c)
I have intel process with MMU, therefore, this is virtual address. May be mimalloc or something else allocate another heap at this address?

@synacker
Copy link
Author

synacker commented Apr 9, 2020

I found a problem. The problem is occured in our mimalloc fork:
dev...synacker:dev

This fork need for clearing memory before os free.
In our case, user cleanup function is next:

static void MemorySimpleCleanUp(void *, void* ptr, size_t size) noexcept
{
    std::memset(ptr, 0, size);
}

It means, that _mi_call_user_cleanup function is clear memory before os free.
Of course, this is our fork problem, but looks like a segmentation problem in mimalloc.
In our fork we are zeroficating memory, therefore this problem is going on.
In mimalloc possible calling already deleted, but not cleared segment, therefore, we are getting segmentation fault in mi_segment_init.

@synacker
Copy link
Author

I added pull request with problem demostration. But, this request don't have bug fix, need additional investigation for this.

@daanx
Copy link
Collaborator

daanx commented Apr 20, 2020

Thanks again for your detailed report -- glad you found the issue. As you say though, it should in principle be ok to memset memory that is just about to be freed. I will investigate this further; it might be that the memory was reset? or perhaps decommitted before being freed (through the MIMALLOC_RESET_DECOMMITS flag?)

@daanx
Copy link
Collaborator

daanx commented Apr 20, 2020

Ah, I think I found the error; In your fork, replace in os.c,

 _mi_call_user_cleanup(addr, size);

with

if (was_committed) { _mi_call_user_cleanup(addr, size); }

or otherwise you might try to set memory that was already decommitted.
Hope that fixes the issue for you.

@synacker
Copy link
Author

synacker commented Apr 20, 2020

@daanx Thank you for reply!
The last change in os.c file already check was_commited flag:
https://github.com/microsoft/mimalloc/pull/225/files#diff-a96f1cd811d3fd179514d3781b5c0bb6R218.

Yes, I think that problem in memory reset case. In this way you will get zeroficated memory in my fork, and the same memory in your repo. Therefore, my fork detect problem in repeat usage of memory, that that don't must be used.

@daanx
Copy link
Collaborator

daanx commented Apr 20, 2020

Ah, so that wasn't it. Have you tried running with MIMALLOC_PAGE_RESET=0 ? I wonder if that is it.
Also, do you use large OS pages? Ah, and I see that on 32-bit OS's the MIMALLOC_RESET_DECOMMITS=1 by default; perhaps running with MIMALLOC_RESET_DECOMMITS=0 will fix your problem? (and if so, that may help me pinpoint why was_committed was not set correctly :-) )

@synacker
Copy link
Author

Ok, I will try to set MIMALLOC_RESET_DECOMMITS to 0 and will return to this issue tomorrow. Thank you for attention!

@daanx
Copy link
Collaborator

daanx commented Apr 20, 2020

Hmm, I might have found the issue but not sure; Can you try to change in segment.c, in mi_segment_os_free the line (for me 478):

if (segment->page_kind >= MI_PAGE_LARGE && !mi_option_is_enabled(mi_option_eager_page_commit)) {

to

if (segment->page_kind < MI_PAGE_LARGE && !mi_option_is_enabled(mi_option_eager_page_commit)) {

i.e. >= to <. Not 100% sure as it has been awhile but it might be it.

@synacker
Copy link
Author

Ok, I will try, thank you )

@daanx
Copy link
Collaborator

daanx commented Apr 20, 2020

Thanks; I know it is difficult to debug this way but would be nice to figure out what is going on :-)

@synacker
Copy link
Author

@daanx I tired to replace condition - c1f07a7 and set reset_decommit to zero - 64e722f

The result in my production code is the same - segmentation fault at 0x944000000 segment address.
But, replace condition improved pipeline in my fork with zerofictaion memory - https://github.com/microsoft/mimalloc/pull/225/checks, now only windows debug build is failed.

@synacker
Copy link
Author

synacker commented Apr 21, 2020

@synacker
Copy link
Author

Btw, I can share docker container for 32 bit, is there sense to add it to main mimalloc pipeline?

@daanx
Copy link
Collaborator

daanx commented Apr 21, 2020

@daanx do you can get core dump for https://dev.azure.com/Daan0324/mimalloc/_build/results?buildId=477&view=logs&j=d0a0256f-db7e-5456-d704-4bb13fa5c757&t=660cfb98-735a-5968-1ec2-aaac3a1c9b80&l=16 pipeline build? May this is the same situation.

The main difference from mimalloc test in main repo is memory zerofication - https://github.com/microsoft/mimalloc/pull/225/files#diff-75b0bfb1a198ffa3dad04768fdd1f857

Ah very interesting. Just to make sure I understand: this is the regular mimalloc failing on the stress test, except for one change: memory zerofication just before free ? (or did it also include the change to the condition that I suggested?) If so, I might be able to repro locally -- I will try so.

@daanx
Copy link
Collaborator

daanx commented Apr 21, 2020

Btw, I can share docker container for 32 bit, is there sense to add it to main mimalloc pipeline?

That would be great, the more testing the better :-) However, if you add it to the mimalloc pipeline I think it would trigger on every commit we make; it that is ok with you? I have no idea how to set these things up so you would need to submit a PR for the yaml :-)

@synacker
Copy link
Author

@daanx ok. I didn't have expirience with Azure Pipline before, but I will try )

@synacker
Copy link
Author

@daanx do you can get core dump for https://dev.azure.com/Daan0324/mimalloc/_build/results?buildId=477&view=logs&j=d0a0256f-db7e-5456-d704-4bb13fa5c757&t=660cfb98-735a-5968-1ec2-aaac3a1c9b80&l=16 pipeline build? May this is the same situation.
The main difference from mimalloc test in main repo is memory zerofication - https://github.com/microsoft/mimalloc/pull/225/files#diff-75b0bfb1a198ffa3dad04768fdd1f857

Ah very interesting. Just to make sure I understand: this is the regular mimalloc failing on the stress test, except for one change: memory zerofication just before free ? (or did it also include the change to the condition that I suggested?) If so, I might be able to repro locally -- I will try so.

This commit includes zerofication and condition replace - c1f07a7. And this fixed most of pipeline builds - https://github.com/microsoft/mimalloc/pull/225/checks?sha=c1f07a74a1889e76ed7b9adfd71b5a600e1c0c16

This commit contains only zerofication - 99ea9e6, and several builds are failed - https://github.com/microsoft/mimalloc/pull/225/checks?sha=99ea9e6c6080dab2c8065ffe425c97c6b0ccbb63

@daanx
Copy link
Collaborator

daanx commented Apr 21, 2020

Ah I see. Thanks for the clarification. I will try to reproduce and test more somewhere this week as I think it does point to an error in mimalloc itself (that normally does not show up but is triggered by the memory cleanup addition)

@daanx
Copy link
Collaborator

daanx commented Apr 24, 2020

Perhaps 7123efb fixes the issue. It was not the condition check for sure, that was correct. Let me know how it goes.

@synacker
Copy link
Author

@daanx thank you, I will try to check in nearest days!

@synacker
Copy link
Author

@daanx I tried and result is the same.
I think I need to create proof-of-concept:

  1. Web application based on cpprestsdk http listener
  2. Mimalloc global override
  3. 32 bit debian os on docker

Then, I will can demonstrate my problem with big data over http with mimalloc on 32 bit os.
I need time for this.
Also, I think, that this demo will be usefull for presentation mimalloc use cases.

@daanx
Copy link
Collaborator

daanx commented Apr 28, 2020

Darn; I will try more to replicate. If you can have me debug on a remote machine that would be ok. Thanks.

@synacker
Copy link
Author

@daanx sorry, but I cann't. I will try to create open source sample project.

@daanx
Copy link
Collaborator

daanx commented Jul 27, 2020

I believe this was eventually caused by the hooks in PR #254 so I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants