dparse: Fix memory corruption#445
Conversation
| newLength <<= 1; | ||
| auto oldPtr = arr.ptr; | ||
| Mallocator.instance.reallocate(arr, newLength); | ||
| GC.removeRange(oldPtr); |
There was a problem hiding this comment.
I think there is a race condition here, which is unavoidable with the reallocate API.
We need to call removeRange after the data is copied, and after calling addRange on the new copy, but before the old buffer is deallocated. reallocate does both the copy and the deallocation together.
A bit surprising that stdx.allocator doesn't have a layer which calls addRange/removeRange, though maybe that kind of defeats the point of using custom allocators.
Also note that we can't make this call conditional on hasIndirections!T because T is just for the value being allocated right now, we don't know what the rest of the buffer contains, nor can we know if the buffer will be completely used for types without indirections.
Based upon your description of the debugging, if we could get the GC to free ASAP and not reuse memory, then ldc's address sanitizer should do the job for this problem. |
To do that, you would need either something like reference counting or performing a full mark&sweep every statement. |
|
That's not what I meant. I meant instead of setting it as free internally when ready to free, just free that memory range. So that ASAN knows that the memory should not be accessible. |
|
In this case, that wouldn't help at all. You would just get the ASAN violation instead of the |
WebFreak001
left a comment
There was a problem hiding this comment.
Awesome! Great to see the GC issue could be fixed by changing the allocators. I wonder how it would be possible to avoid such an issue in the future? Should all custom allocators forbid allocating data with indirections unless GC addRange/removeRange has been added properly? Could we add interfaces or patterns that could enforce that statically?
Anyway would love to get this in soon and apply updates to DScanner, dfmt, DCD, libddoc
Sure, that's one way. I suggested another way (only allow non-GC pointers, and verify that this is true at runtime in debug builds).
It would certainly be possible with a custom pointer type, which indicates that the memory is managed manually and not owned by the GC, and having the allocator types only accept values which have no indirections except via this wrapper. But I can't predict if the syntactic overhead would be bearable. I don't think we can generally achieve memory safety using static checks at this scale, without becoming Rust. :) |
|
need someone else to merge, can't merge until travis is done, which I think we can wait for forever since travis-ci.org is gone. |
|
Did the force push, and made |
Following up to the DLF meeting yesterday. I don't really know much about libdparse's memory handling philosophy to know if this is an idiomatic fix, but it does definitely fix #387 (comment) by addressing the root cause (GC pointers in non-GC memory).
An alternative approach would be to add
debugchecks which walk every type added to one of thestdx.allocator-based constructs, and assert at runtime that all pointers within the value don't point into GC-owned memory; and, of course, actually moving all data to allocator-owned memory (which in this case probably means interning the doc string).In this case, the GC-owned memory was allocated here:
Details
And the last point where it was still seen by the GC was here:
Details
For the record, here is the strategy I used to pinpoint the problem:
Disable ASLR (determinism)
Disable GC threads (more determinism)
Fast builds: disable optimization in Phobos/Druntime, build only shared libraries, link dscanner with shared library
Find the address of the lost memory, i.e. the buffer that ends up containing bad UTF8 (add
writelnbefore the code that throwsUTFException)Add GC hook to trap when the address is allocated and when it's freed
LeakDetector. This is still currently a bit more complicated than necessary, will upstream some fixes soonasm{int 3;}and ran in GDB when I needed a stack trace.Examine the lifetime of all values that share the problematic address
In this case, it was:
std.array.appenderbuffer is allocatedstd.array.appenderbuffer is wrongly freedappenderbuffer is accessed, causing the UTF errorBisect the point in the program's execution where the value is last visible in the GC.
The general strategy is that you want to call
GC.collectas frequently as possible (though not that frequently that the program now takes forever to run). I did this in three steps:fullcollect()toGcx.alloc)rt.trace, and callingGC.collect()when the call counter is within the bisected certain range)GC.collect()calls in Dscanner's code in functions identified by the above step.We definitely would benefit from better tools to debug such problems; though, in this case, an educated guess might have sufficed.