Skip to content

dparse: Fix memory corruption#445

Merged
Geod24 merged 1 commit intodlang-community:masterfrom
CyberShadow:pull-20210724-133216
Jul 25, 2021
Merged

dparse: Fix memory corruption#445
Geod24 merged 1 commit intodlang-community:masterfrom
CyberShadow:pull-20210724-133216

Conversation

@CyberShadow
Copy link
Member

@CyberShadow CyberShadow commented Jul 24, 2021

Following up to the DLF meeting yesterday. I don't really know much about libdparse's memory handling philosophy to know if this is an idiomatic fix, but it does definitely fix #387 (comment) by addressing the root cause (GC pointers in non-GC memory).

An alternative approach would be to add debug checks which walk every type added to one of the stdx.allocator-based constructs, and assert at runtime that all pointers within the value don't point into GC-owned memory; and, of course, actually moving all data to allocator-owned memory (which in this case probably means interning the doc string).

In this case, the GC-owned memory was allocated here:

Details
#0  0x00007ffff7dfb9d4 in core.internal.gc.impl.conservative.gc.Gcx.alloc(ulong, ref ulong, uint, const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#1  0x00007ffff7df9723 in core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#2  0x00007ffff7e005f2 in core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#3  0x00007ffff7df97c7 in core.internal.gc.impl.conservative.gc.ConservativeGC.qalloc(ulong, uint, scope const(TypeInfo)) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#4  0x00007ffff7df8d79 in gc_qalloc () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#5  0x00007ffff7be9ff8 in std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong).__lambda9() () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#6  0x00007ffff7be9f28 in std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#7  0x00007ffff7be9d55 in std.array.Appender!(immutable(char)[]).Appender.reserve(ulong) () from /home/vladimir/work/extern/D/phobos/generated/linux/release/64/libphobos2.so.0.97
#8  0x00000000009ed5ba in dparse.trivia.MultiLineCommentHelper!(immutable(char)).MultiLineCommentHelper.process!(std.array.Appender!(immutable(char)[]).Appender).process(ref std.array.Appender!(immutable(char)[]).Appender) (this=..., outbuffer=...) at libdparse/src/dparse/trivia.d:469
#9  0x00000000009eaa1e in dparse.trivia.unDecorateComment!(std.array.Appender!(immutable(char)[]).Appender).unDecorateComment(immutable(char)[], ref std.array.Appender!(immutable(char)[]).Appender) (outputRange=..., comment=...) at libdparse/src/dparse/trivia.d:108
#10 0x00000000009ea744 in dparse.trivia.extractDdocFromTrivia!(immutable(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[]).extractDdocFromTrivia(immutable(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[]) (tokens=...) at libdparse/src/dparse/trivia.d:531
#11 0x00000000009db7b2 in dparse.trivia.extractLeadingDdoc(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)) (token=...) at libdparse/src/dparse/trivia.d:562
#12 0x00000000008a6a8d in std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure.__mixin9.comment() const (this=...) at libdparse/src/dparse/lexer.d:155
#13 0x00000000009f9d3c in dparse.parser.Parser.parseDeclaration(bool, bool, bool) (this=0x7ffff74630b0, inTemplateDeclaration=false, mustBeDeclaration=true, strict=true) at libdparse/src/dparse/parser.d:2135
#14 0x0000000000a05c4c in dparse.parser.Parser.parseModule() (this=0x7ffff74630b0) at libdparse/src/dparse/parser.d:4847
#15 0x0000000000a51ecc in dparse.parser.parseModule!().parseModule(dparse.parser.ParserConfig) (parserConfig=...) at libdparse/src/dparse/parser.d:71
#16 0x0000000000a51df6 in dparse.parser.parseModule!(void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate).parseModule(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], immutable(char)[], dparse.rollback_allocator.RollbackAllocator*, void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, messageFuncOrDg=..., allocator=0x7fffffffb8a0, fileName=..., tokens=...) at libdparse/src/dparse/parser.d:103
#17 0x0000000000b88112 in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, ref dparse.lexer.StringCache, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, dlgMessage=..., tokens=..., cache=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:273
#18 0x0000000000b8821a in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, immutable(char)[], ref dparse.lexer.StringCache, bool, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, tokens=..., report=false, cache=..., errorFormat=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:286
#19 0x0000000000b87e76 in dscanner.analysis.run.analyze(immutable(char)[][], const(dscanner.analysis.config.StaticAnalysisConfig), immutable(char)[], ref dparse.lexer.StringCache, ref dsymbol.modulecache.ModuleCache, bool) (staticAnalyze=true, moduleCache=..., cache=..., errorFormat=..., config=..., fileNames=...) at src/dscanner/analysis/run.d:243
#20 0x0000000000c010ee in D main (args=...) at src/dscanner/main.d:263

And the last point where it was still seen by the GC was here:

Details
#1  0x0000000000a443b7 in dparse.parser.Parser.parseDeclaration(bool, bool, bool) (this=0x7ffff74630b0, inTemplateDeclaration=false, mustBeDeclaration=true, strict=true) at libdparse/src/dparse/parser.d:2138
#2  0x0000000000a4f4b8 in dparse.parser.Parser.parseModule() (this=0x7ffff74630b0) at libdparse/src/dparse/parser.d:4847
#3  0x0000000000aad06c in dparse.parser.parseModule!().parseModule(dparse.parser.ParserConfig) (parserConfig=...) at libdparse/src/dparse/parser.d:71
#4  0x0000000000aacf72 in dparse.parser.parseModule!(void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate).parseModule(const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], immutable(char)[], dparse.rollback_allocator.RollbackAllocator*, void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, messageFuncOrDg=..., allocator=0x7fffffffb8a0, fileName=..., tokens=...) at libdparse/src/dparse/parser.d:103
#5  0x0000000000c18b52 in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, ref dparse.lexer.StringCache, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], void(immutable(char)[], ulong, ulong, immutable(char)[], bool) delegate, ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, dlgMessage=..., tokens=..., cache=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:273
#6  0x0000000000c18c7e in dscanner.analysis.run.parseModule(immutable(char)[], ubyte[], dparse.rollback_allocator.RollbackAllocator*, immutable(char)[], ref dparse.lexer.StringCache, bool, ref const(std.experimental.lexer.TokenStructure!(ubyte, "import dparse.lexer:TokenTriviaFields; mixin TokenTriviaFields;").TokenStructure)[], ulong*, uint*, uint*) (warningCount=0x7fffffffb8ac, errorCount=0x7fffffffb8a8, linesOfCode=0x0, tokens=..., report=false, cache=..., errorFormat=..., p=0x7fffffffb8a0, code=..., fileName=...) at src/dscanner/analysis/run.d:286
#7  0x0000000000c18892 in dscanner.analysis.run.analyze(immutable(char)[][], const(dscanner.analysis.config.StaticAnalysisConfig), immutable(char)[], ref dparse.lexer.StringCache, ref dsymbol.modulecache.ModuleCache, bool) (staticAnalyze=true, moduleCache=..., cache=..., errorFormat=..., config=..., fileNames=...) at src/dscanner/analysis/run.d:243
#8  0x0000000000cb407c in D main (args=...) at src/dscanner/main.d:263

For the record, here is the strategy I used to pinpoint the problem:

  1. Disable ASLR (determinism)

  2. Disable GC threads (more determinism)

  3. Fast builds: disable optimization in Phobos/Druntime, build only shared libraries, link dscanner with shared library

  4. Find the address of the lost memory, i.e. the buffer that ends up containing bad UTF8 (add writeln before the code that throws UTFException)

  5. Add GC hook to trap when the address is allocated and when it's freed

    • I did this by adding code to the stub LeakDetector. This is still currently a bit more complicated than necessary, will upstream some fixes soon
    • For the trap I used asm{int 3;} and ran in GDB when I needed a stack trace.
  6. Examine the lifetime of all values that share the problematic address

    In this case, it was:

    1. std.array.appender buffer is allocated
    2. std.array.appender buffer is wrongly freed
    3. An associative array internal structure is allocated
    4. (the associative array clobbers the buffer memory - this is where the UTF error is introduced)
    5. The associative array internal structure is freed (not that it matters for us)
    6. The appender buffer is accessed, causing the UTF error
  7. Bisect the point in the program's execution where the value is last visible in the GC.

    The general strategy is that you want to call GC.collect as frequently as possible (though not that frequently that the program now takes forever to run). I did this in three steps:

    1. Coarse: every allocation (by adding fullcollect() to Gcx.alloc)
    2. Fine: every function call (by stubbing then adding a call counter to the trace hooks in rt.trace, and calling GC.collect() when the call counter is within the bisected certain range)
    3. Very fine: by manually adding GC.collect() calls in Dscanner's code in functions identified by the above step.

We definitely would benefit from better tools to debug such problems; though, in this case, an educated guess might have sufficed.

newLength <<= 1;
auto oldPtr = arr.ptr;
Mallocator.instance.reallocate(arr, newLength);
GC.removeRange(oldPtr);
Copy link
Member Author

@CyberShadow CyberShadow Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a race condition here, which is unavoidable with the reallocate API.

We need to call removeRange after the data is copied, and after calling addRange on the new copy, but before the old buffer is deallocated. reallocate does both the copy and the deallocation together.

A bit surprising that stdx.allocator doesn't have a layer which calls addRange/removeRange, though maybe that kind of defeats the point of using custom allocators.

Also note that we can't make this call conditional on hasIndirections!T because T is just for the value being allocated right now, we don't know what the rest of the buffer contains, nor can we know if the buffer will be completely used for types without indirections.

@rikkimax
Copy link
Contributor

We definitely would benefit from better tools to debug such problems; though, in this case, an educated guess might have sufficed.

Based upon your description of the debugging, if we could get the GC to free ASAP and not reuse memory, then ldc's address sanitizer should do the job for this problem.

@CyberShadow
Copy link
Member Author

if we could get the GC to free ASAP

To do that, you would need either something like reference counting or performing a full mark&sweep every statement.

@rikkimax
Copy link
Contributor

That's not what I meant. I meant instead of setting it as free internally when ready to free, just free that memory range. So that ASAN knows that the memory should not be accessible.

@CyberShadow
Copy link
Member Author

In this case, that wouldn't help at all. You would just get the ASAN violation instead of the UTFException, in the exact same circumstances, and with no other clues to help.

Copy link
Member

@WebFreak001 WebFreak001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great to see the GC issue could be fixed by changing the allocators. I wonder how it would be possible to avoid such an issue in the future? Should all custom allocators forbid allocating data with indirections unless GC addRange/removeRange has been added properly? Could we add interfaces or patterns that could enforce that statically?

Anyway would love to get this in soon and apply updates to DScanner, dfmt, DCD, libddoc

@CyberShadow
Copy link
Member Author

CyberShadow commented Jul 24, 2021

I wonder how it would be possible to avoid such an issue in the future? Should all custom allocators forbid allocating data with indirections unless GC addRange/removeRange has been added properly?

Sure, that's one way.

I suggested another way (only allow non-GC pointers, and verify that this is true at runtime in debug builds).

Could we add interfaces or patterns that could enforce that statically?

It would certainly be possible with a custom pointer type, which indicates that the memory is managed manually and not owned by the GC, and having the allocator types only accept values which have no indirections except via this wrapper. But I can't predict if the syntactic overhead would be bearable.

I don't think we can generally achieve memory safety using static checks at this scale, without becoming Rust. :)

@WebFreak001
Copy link
Member

need someone else to merge, can't merge until travis is done, which I think we can wait for forever since travis-ci.org is gone.

@Geod24 Geod24 merged commit 2772fc8 into dlang-community:master Jul 25, 2021
@Geod24
Copy link
Member

Geod24 commented Jul 25, 2021

Did the force push, and made travis-ci optional so it won't block you in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants