Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 2.078.1 front-end and stdlibs #2486

Merged
merged 32 commits into from
Jan 28, 2018
Merged

Merge 2.078.1 front-end and stdlibs #2486

merged 32 commits into from
Jan 28, 2018

Conversation

kinke
Copy link
Member

@kinke kinke commented Jan 6, 2018

No description provided.

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

The ICE when compiling LDC on Windows is bad; it's a front-end bug for MSVC targets which has been (wrongly) worked around only since 2.078.0 itself AFAICT, i.e., all current host compilers on Windows will crash. If we're lucky, an older version (ltsmaster) doesn't feature that particular MSVC-specific check.

The bug is here:

ldc/ddmd/func.d

Lines 664 to 674 in 13d5cd6

final BaseClass* overrideInterface()
{
ClassDeclaration cd = parent.isClassDeclaration();
foreach (b; cd.interfaces)
{
auto v = findVtblIndex(&b.sym.vtbl, cast(int)b.sym.vtbl.dim);
if (v >= 0)
return b;
}
return null;
}

FuncDeclaration.parent is a TemplateMixin (dmd.transitivevisitor.ParseVisitMethods mixed into dmd.visitor.SemanticTimeTransitiveVisitor) while the code assumes it's a valid ClassDeclaration (and since 2.078.0 wrongly aborts the check if it isn't). [DMD host compilers < 2.078.0 crash too with -m32mscoff and -m64; -m32 doesn't perform that check.]

=> DMD issue #18200

@rainers
Copy link
Contributor

rainers commented Jan 6, 2018

The crash happens since dmd 2.071, see https://issues.dlang.org/show_bug.cgi?id=18093.

If my tests were correct, the workaround to build with pre dmd 2.078 versions could be to add extern (C++) to the overloads in the template mixins.

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

Ah thanks, so it's known already.

LDC 1.0.0 (2.070) manages to compile the new front-end, but the LDC then immediately fails when compiling Phobos (visit() function not hit, most likely due to another bug wrt. overload resolution or vtable... - edit: 2.070 also fails to compile dmd.asttypename, which luckily turned out to be unused).
I don't see how an additional extern(C++) in the mixin could solve this; I tried slapping an extern(C++): on top of the > 1k lines mixin, to no avail. IIRC, the problem weren't the overrides, but new virtual functions, for which there's this special check for MSVC targets:

ldc/ddmd/dsymbolsem.d

Lines 4528 to 4543 in 13d5cd6

/* These quirky conditions mimic what VC++ appears to do
*/
if (global.params.mscoff && cd.cpp &&
cd.baseClass && cd.baseClass.vtbl.dim)
{
/* if overriding an interface function, then this is not
* introducing and don't put it in the class vtbl[]
*/
funcdecl.interfaceVirtual = funcdecl.overrideInterface();
if (funcdecl.interfaceVirtual)
{
//printf("\tinterface function %s\n", toChars());
cd.vtblFinal.push(funcdecl);
goto Linterfaces;
}
}

@rainers
Copy link
Contributor

rainers commented Jan 6, 2018

Adding extern(C++) worked in the reduced test case in the bug report (no idea why), but doesn't help in dmd.
Instead, it seems to be ok to make all non-override functions private: the check of the base class vtable is skipped in that case:

 src/dmd/transitivevisitor.d | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/dmd/transitivevisitor.d b/src/dmd/transitivevisitor.d
index 19ee198..8f8fba4 100644
--- a/src/dmd/transitivevisitor.d
+++ b/src/dmd/transitivevisitor.d
@@ -56,7 +56,7 @@ package mixin template ParseVisitMethods(AST)
         }
     }
 
-    void visitVarDecl(AST.VarDeclaration v)
+    private void visitVarDecl(AST.VarDeclaration v)
     {
         //printf("Visiting VarDeclaration\n");
         if (v.type)
@@ -168,7 +168,7 @@ package mixin template ParseVisitMethods(AST)
             s.elsebody.accept(this);
     }
 
-    void visitArgs(AST.Expressions* expressions, AST.Expression basis = null)
+    private void visitArgs(AST.Expressions* expressions, AST.Expression basis = null)
     {
         if (!expressions || !expressions.dim)
             return;
@@ -298,7 +298,7 @@ package mixin template ParseVisitMethods(AST)
             imp.accept(this);
     }
 
-    void visit(AST.Catch c)
+    private void visit(AST.Catch c)
     {
         //printf("Visiting Catch\n");
         if (c.type)
@@ -310,7 +310,7 @@ package mixin template ParseVisitMethods(AST)
 //   Type Nodes
 //============================================================
 
-    void visitType(AST.Type t)
+    private void visitType(AST.Type t)
     {
         //printf("Visiting Type\n");
         if (!t)
@@ -324,7 +324,7 @@ package mixin template ParseVisitMethods(AST)
             t.accept(this);
     }
 
-    void visitFunctionType(AST.TypeFunction t, AST.TemplateDeclaration td)
+    private void visitFunctionType(AST.TypeFunction t, AST.TemplateDeclaration td)
     {
         if (t.next)
             visitType(t.next);
@@ -336,7 +336,7 @@ package mixin template ParseVisitMethods(AST)
         visitParameters(t.parameters);
     }
 
-    void visitParameters(AST.Parameters* parameters)
+    private void visitParameters(AST.Parameters* parameters)
     {
         if (parameters)
         {
@@ -405,7 +405,7 @@ package mixin template ParseVisitMethods(AST)
         visitFunctionType(cast(AST.TypeFunction)t.next, null);
     }
 
-    void visitTypeQualified(AST.TypeQualified t)
+    private void visitTypeQualified(AST.TypeQualified t)
     {
         //printf("Visiting TypeQualified\n");
         foreach (id; t.idents)
@@ -481,7 +481,7 @@ package mixin template ParseVisitMethods(AST)
 
 //      Declarations
 //=========================================================
-    void visitAttribDeclaration(AST.AttribDeclaration d)
+    private void visitAttribDeclaration(AST.AttribDeclaration d)
     {
         if (d.decl)
             foreach (de; *d.decl)
@@ -570,7 +570,7 @@ package mixin template ParseVisitMethods(AST)
         visitAttribDeclaration(cast(AST.AttribDeclaration)d);
     }
 
-    void visitFuncBody(AST.FuncDeclaration f)
+    private void visitFuncBody(AST.FuncDeclaration f)
     {
         //printf("Visiting funcBody\n");
         if (!f.fbody)
@@ -582,7 +582,7 @@ package mixin template ParseVisitMethods(AST)
         f.fbody.accept(this);
     }
 
-    void visitBaseClasses(AST.ClassDeclaration d)
+    private void visitBaseClasses(AST.ClassDeclaration d)
     {
         //printf("Visiting ClassDeclaration\n");
         if (!d || !d.baseclasses.dim)
@@ -591,7 +591,7 @@ package mixin template ParseVisitMethods(AST)
             visitType(b.type);
     }
 
-    bool visitEponymousMember(AST.TemplateDeclaration d)
+    private bool visitEponymousMember(AST.TemplateDeclaration d)
     {
         //printf("Visiting EponymousMember\n");
         if (!d.members || d.members.dim != 1)
@@ -647,7 +647,7 @@ package mixin template ParseVisitMethods(AST)
         return false;
     }
 
-    void visitTemplateParameters(AST.TemplateParameters* parameters)
+    private void visitTemplateParameters(AST.TemplateParameters* parameters)
     {
         if (!parameters || !parameters.dim)
             return;
@@ -669,7 +669,7 @@ package mixin template ParseVisitMethods(AST)
             s.accept(this);
     }
 
-    void visitObject(RootObject oarg)
+    private void visitObject(RootObject oarg)
     {
         if (auto t = AST.isType(oarg))
         {
@@ -687,7 +687,7 @@ package mixin template ParseVisitMethods(AST)
         }
     }
 
-    void visitTiargs(AST.TemplateInstance ti)
+    private void visitTiargs(AST.TemplateInstance ti)
     {
         //printf("Visiting tiargs\n");
         if (!ti.tiargs)

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

Thanks, I'll give that a shot too. I'm currently stumbling from one problem to the next; DMD 2.078.0 doesn't link out of the box in an MSVC 2017 environment, I thought that was one of the news? Error: can't run 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\\bin\link.exe', check PATH (cmd here).

@rainers
Copy link
Contributor

rainers commented Jan 6, 2018

I'm currently stumbling from one problem to the next; DMD 2.078.0 doesn't link out of the box in an MSVC 2017 environment, I thought that was one of the news?

Unfortunately a corresponding installer PR didn't make it into the release, so you have to nuke most settings in sc.ini. For auto detection the Environment64 section should look this:

[Environment64]
LIB=%@P%\..\lib64
DFLAGS=%DFLAGS% -L/OPT:NOICF

@rainers
Copy link
Contributor

rainers commented Jan 6, 2018

The LINKCMD settings is propably the only harmful setting in the zip file version. You could filter that out with sed -i /LINKCMD/d sc.ini.

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

Thanks; tweaking the ini locally worked, but I forgot about the cmdline parsing issue (dmd ... "-LC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\DIA SDK\lib\amd64\diaguids.lib" => LINK : fatal error LNK1181: cannot open input file 'C:\Program.obj') rendering DMD unusable for compiling LDC with an LLVM incl. LLD...

Making the previously new virtual functions private works indeed, i.e., gets rid of the segfault, but the visit() issue persists (so that wasn't 2.070-specific - making sense, as ltsmaster works as host compiler for Travis). Maybe an MSVC-specific divergence between D and C++ headers...

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

The issue seems to boil down to vtable layout. Excerpt of stack trace with debug build of LDC:

ldc2.exe!Visitor::visit(Statement * __formal) Line 306	C++	Symbols loaded.
ldc2.exe!Visitor::visit(ForStatement * s) Line 406	C++	Symbols loaded.
ldc2.exe!dmd.dimport.Import.accept() Line 311	D	Symbols loaded.

Import.accept(Visitor v) doesn't forward to Visitor.visit(Import), but to Visitor.visit(ForStatement), both miles apart in dmd/visitor.h.

@kinke
Copy link
Member Author

kinke commented Jan 6, 2018

Oh this appears to be fixed by fixing visitor.h and letting class Visitor derive from a ParseTimeVisitor base class, as in D code. Visual C++ probably uses a reverse layout...

…tor)

Apparently crucial for correct MSVC++ vtable layout.
Now all done in the front-end, no need for us to handle it manually
anymore. This also fixes tests/semantic/dcompute.d.
Using the existing pragma(LDC_global_crt_{c,d}tor [, priority])
infrastructure.
By enforcing a -singleobj build in that case, treating the special __main
module as a regular additional source module.

This fixes the new test fail_compilation/test11006 although this check
most likely wasn't intended. It previously errored out with:

Error: Output file '...' for module `__main` collides with previous module `test11006`. See the -oq option
@kinke
Copy link
Member Author

kinke commented Jan 7, 2018

Green on Linux. :)

@kinke
Copy link
Member Author

kinke commented Jan 10, 2018

Can anyone on OSX check the issue with shared libs?

@JohanEngelen
Copy link
Member

I have had a number of things in the pipeline for a long time so I want to spend my time on that first in the next two weeks or so, sorry.

@joakim-noah
Copy link
Contributor

joakim-noah commented Jan 13, 2018

Tried this pull out on Android/ARM and ran the stdlib tests, have 6 phobos modules that segfault when running the tests, all related to optimization codegen as they only fail at -O1 or higher. All but one are related to regex, as they segfault in the Thompson regex engine, only std.net.curl ends up in libcrypto unrelated to regex.

First time I've had issues on Android/ARM since late last summer, #2024, will investigate further.

Conflicts:
	driver/main.cpp
	gen/cl_helpers.h
	gen/declarations.cpp
	runtime/druntime
	tests/d2/dmd-testsuite
@kinke
Copy link
Member Author

kinke commented Jan 20, 2018

@klickverbot: Can you be bothered? ;) - By the looks of it, each shared unittest fails, so possibly even a hello world due to some fundamental issue during druntime initialization. So building this branch, a hello world with -link-defaultlib-shared -link-defaultlib-debug and running it with a debugger might be enough to get an idea of what's happening. I had a second look at the rt.sections_elf_shared diff, but didn't find a likely cause. As there's no output in the CI logs, I can't do much myself.

@kinke
Copy link
Member Author

kinke commented Jan 20, 2018

[This somehow doesn't suffer from the new dmd-testsuite timeouts for CircleCI Linux and the master branch. What seems to be hanging there is 32-bit runnable/pi.d, but that only started about a week ago or so. So merging this soon would fix the Circle issue, which is required for automated builds.]

@joakim-noah
Copy link
Contributor

@kinke, do you have shell access rights for the Travis Mac CI? That could be a way for you to get into a Mac.

@kinke kinke changed the title Merge 2.078.0 front-end and stdlibs Merge 2.078.1 front-end and stdlibs Jan 22, 2018
DMD actually expects a relative one.

It's required now on Windows with new test runnable/test13742.sh
due to the limited string support of rt.config for runtime command-line
options like `--DRT-covopt=dstpath:C:\dir` (colon in C:\ treated as
separator).
@kinke
Copy link
Member Author

kinke commented Jan 24, 2018

I was a bit reluctant at first but got SSH set up with Circle, works like a charm. So the OSX shared lib issue should be fixed.

2.078.1 came with a new test, runnable/test13742.sh. It tests emission of coverage files; 2 modules in a static lib, which is then linked with a third module, all compiled with -cov. We don't emit the coverage file for the first module; it only contains a template. In emitCoverageLinecountInc(), there's this comment:

// Only emit coverage increment for locations in the source of the current module
// (for example, 'inlined' methods from other source files should be skipped).
if (!global.params.cov || !loc.linnum || !loc.filename || !m->d_cover_data ||
    strcmp(m->srcfile->name->toChars(), loc.filename) != 0) {
  return;
}

So this means that a template which isn't emitted in its declaring module won't have any coverage, although it may be instantiated and emitted in other modules, and seems like the obvious reason for the test failure. DMD apparently now decorates the coverage data symbol names with the module ID and makes them global, so we should probably do the same.

This is what DMD does and crucial for `ldmd2 -od=objects foo.d` with
relative -od path (and no -of). The object file path will be
`objects/foo.o[bj]`. As it's relative, LDC used to prepend the objects
dir (again) in LDMD mode, resulting in the inferred executable file name
`objects/objects/foo[.exe]`.

So while this is a breaking change, it fixes DMD compatibility of LDMD
and makes a lot of sense for 'pure' LDC too IMO (use the first object's
file name, replace the extension and save it in the working dir, not in
the directory containing the first object file).

This fixes the dmd-testsuite regressions with relative RESULTS_DIR and
a few long-standing non-fatal dmd-testsuite errors (failing file
removals).
@kinke
Copy link
Member Author

kinke commented Jan 27, 2018

@joakim-noah: Any insights? I'm about to merge this to restore Circle builds.

@joakim-noah
Copy link
Contributor

Been looking into ARM, believe it's a Phobos regression related to some std.regex refactoring Dmitry did recently. I will submit a pull once I track it down, likely upstream, please go ahead and merge this pull whenever you're ready.

@kinke kinke merged commit 6334c69 into master Jan 28, 2018
@kinke kinke deleted the merge-2.078 branch January 28, 2018 00:41
@joakim-noah
Copy link
Contributor

Spent several hours in a debugger trying to track down the ARM regression in std.regex, one of the recurring segfaults happens here, because t is misaligned by the time it gets there. That suggests earlier memory corruption, which doesn't happen without any llvm optimizations, ie only at -O1 or higher. Also, if I revert Dmitry's std.regex redesign, dlang/phobos#5722, the tests start working again at any optimization level, as that's just where std.regex was with ldc 1.7 anyway.

@DmitryOlshansky, the main big change in std.regex.internal.thompson is that ThompsonMatcher goes from a struct to a class, not sure if that affects the code generation somehow. Before I spend several more hours tracking down the source of the memory corruption, I thought I'd run it by you and Martin to see if you guys have an inkling of where that pull might be throwing off ARM. Let me know if you have an idea.

@JohanEngelen
Copy link
Member

@joakim-noah Do you have a dustmite session running in the meanwhile to reduce the problem case?

@kinke
Copy link
Member Author

kinke commented Feb 13, 2018

Puh, that code is no easy read, I quickly gave up. ;)

the main big change in std.regex.internal.thompson is that ThompsonMatcher goes from a struct to a class, not sure if that affects the code generation somehow.

What comes to mind is that we emit classes as packed IR structs, meaning that there's no trailing padding, conforming to the class-instance-size trait AFAICT. I guess that's fine though (i.e., the GC allocator taking care of alignment), and there shouldn't be any arrays of consecutive object payloads (as opposed to object references).
Does the misaligned pointer point into heap or stack?

@kinke
Copy link
Member Author

kinke commented Feb 13, 2018

Well looking at the PR is much more pleasant, and I strongly suspect the manual memory management via GenericFactory to be the culprit:

   override Matcher!Char dup(Matcher!Char engine, in Char[] input) const @trusted
    {
        immutable size = EngineType!Char.initialMemory(engine.pattern) + classSize;
        auto memory = enforce(malloc(size), "malloc failed")[0 .. size];
        scope(failure) free(memory.ptr);
        auto copy = construct(engine.pattern, input, memory);
        GC.addRange(memory.ptr, classSize);
        engine.dupTo(copy, memory[classSize .. size]); // <-- memory.ptr + classSize may likely be unaligned
        assert(copy.refCount == 1);
        return copy;
    }

I didn't dig deep, but it looks like a class payload and some other thing are stored consecutively in memory, exactly without padding bytes inbetween, so that the 2nd payload is likely to be misaligned. [Also involved: {Runtime,Ctfe}Factory.construct()]

@joakim-noah: You might want to give enum classSize = (__traits(classInstanceSize, EngineType!Char) + size_t.sizeof - 1) & ~(size_t.sizeof - 1); a quick shot and see if that fixes it.

@joakim-noah
Copy link
Contributor

@JohanEngelen, never tried dustmite, will it narrow down Phobos regressions?

@kinke, anyway, dustmite doesn't matter now, because you got it. Adding that padding fixes the problems across all six modules, which were all related to this std.regex change. I focused too closely on the Thompson engine where the segfaults were all happening, and not on the rest of that giant patch. However, even if I had looked at the whole pull as you did, I wouldn't have narrowed it down to that memory allocation issue just by looking at the patch.

Go ahead and commit your std.regex fix and we'll be back to only a single stdlib test failing on ARM again: the EH inlining codegen bug remaining at #2024.

@JohanEngelen
Copy link
Member

never tried dustmite, will it narrow down Phobos regressions?

Sure! https://github.com/CyberShadow/DustMite/wiki#minimizing-the-standard-library

@kinke
Copy link
Member Author

kinke commented Feb 15, 2018

Thx for verifying, Joakim. I pushed the hotfix directly to master; the upstream PR is dlang/phobos#6185.

@joakim-noah
Copy link
Contributor

joakim-noah commented Feb 16, 2018

Maybe time to release a 1.8 beta now? Btw, a dmd bugfix update was just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants