Skip to content

Fix issue 16431 - Speed-up rdmd for single file builds#191

Closed
wilzbach wants to merge 1 commit intodlang:masterfrom
wilzbach:speed-up-rdmd
Closed

Fix issue 16431 - Speed-up rdmd for single file builds#191
wilzbach wants to merge 1 commit intodlang:masterfrom
wilzbach:speed-up-rdmd

Conversation

@wilzbach
Copy link
Contributor

Hey there,

while looking a bit at the run time of the DLang Tour builds I discovered that using single-file DUB builds is faster than rdmd. I digged a bit into this and the problem is that rdmd runs dmd first to get a list of all imports, however this comes at a huge cost (~40% of total runtime).

I use rdmd a lot every day (e.g. every time I run tests) and I expect other to do so too, hence if we can agree on a conservative estimate that covers most cases we could achieve quite a huge benefit. I know of course one could run dmd <file> && ./<file>, but many of our tutorials advise to use rdmd and it's convenient not to have yet another alias for single file builds.

I think the tricky part here is to make a conservative guess without false-positives. I don't think my initial idea is perfect, but maybe someone else has a better idea?

Benchmarks

In any case here the results when benchmarked with this file - with other files one gets similar results.

Currently

> python -m timeit -n 10 -s 'import os' 'os.system("rdmd --force foo.d > /dev/null")'
5 loops, best of 3: 727 msec per loop

With this PR

> python -m timeit -v -n 10 -s 'import os' 'os.system("./rdmd --force foo.d > /dev/null")'
10 loops, best of 3: 440 msec per loop

For reference, DUB:

> python -m timeit -n 10 -s 'import os' 'os.system("dub run --single --force foo.d > /dev/null 2>&1")'
10 loops, best of 3: 568 msec per loop

@dlang-bot
Copy link
Contributor

dlang-bot commented Aug 25, 2016

Fix Bugzilla Description
16431 rdmd runs dmd twice for single-files with no dependencies

@dlang-bot
Copy link
Contributor

@wilzbach, thanks for your PR! By analyzing the annotation information on this pull request, we identified @CyberShadow, @andralex and @leandro-lucarella-sociomantic to be potential reviewers. @CyberShadow: The PR was automatically assigned to you, please reassign it if you were identified mistakenly.

(The DLang Bot is under development. If you experience any issues, please open an issue at its repo.)

@CyberShadow
Copy link
Member

Well, this is one ugly hack :)

dub cheats in that it does not do any dependency discovery. It simply compiles everything under src/.

The correct solution would be to move rdmd into dmd. IIRC Andrej attempted this a while ago, but hit a snag. Can't find the PR at the moment unfortunately.

rdmd.d Outdated
{
foreach (mod; match[1].splitter(","))
{
if (!(mod.startsWith("std") || mod.startsWith("std")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant to write if (!(mod.startsWith("std") || mod.startsWith("core")))

@CyberShadow
Copy link
Member

Anyway, from a first impression this seems rather harmless... false positives simply bring back compilation as it was, and false negatives seem unlikely, I think - it can be made to break with shenanigans like mixin("imp" ~ "ort foobar;");, but that's probably not of much concern.

rdmd.d Outdated
// In most cases we don't have any dependencies except the standard library
auto file = readText(rootModule);
bool needInspection = false;
outer: foreach (match; file.matchAll(ctRegex!`import\s+(.*);`))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import name = std.string; // false positive ;)

@CyberShadow
Copy link
Member

Did you check if the test suite passes? Currently it's not covered by our CI.

{
string[string] noDeps;
return noDeps;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null?

@wilzbach
Copy link
Contributor Author

Did you check if the test suite passes? Currently it's not covered by our CI.

Well, it fails even on master for me, so maybe we should enable something simple as Travis here ;-)

core.exception.AssertError@rdmd_test.d(137): /tmp/.rdmd-1000/eval.B69BD6FDCFC0950D6C336FE7B8FEF8B5.d(14): Error: module syserror is in file 'std/syserror.d' which cannot be read
import path[0] = /tmp/.rdmd-1000
import path[1] = /usr/include/dlang/dmd
Failed: ["dmd", "-de", "-d", "-v", "-o-", "/tmp/.rdmd-1000/eval.B69BD6FDCFC0950D6C336FE7B8FEF8B5.d", "-I/tmp/.rdmd-1000"]

@CyberShadow
Copy link
Member

Actually, there is a problem: the discovered dependencies don't just consist of the imports - but also the compiler and library file. Currently, if the compiler is upgraded, a second rdmd run will rebuild the program, but this breaks that contract.

rdmd.d Outdated
string[string] noDeps;
return noDeps;
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the added and removed code is identical in functionality. An associative array is initialized to null by default.

@CyberShadow
Copy link
Member

Hmm, I liked this a little better when it was a small patch. Regardless, the above mentioned breakage is a blocker, and I think slapping on more band-aids e.g. by doubling those checks in rdmd for this edge case would not be a good way forward. I don't think the inconsistencies and extra code it brings justify the gains.

The travis configuration should probably be in its own PR, and it should probably run the rdmd test suite, which could then run rdmd unit tests if so needed (so users don't have to test both separately).

@CyberShadow
Copy link
Member

Perhaps there is an opportunity for optimization for successive runs, i.e. when the source has changed. Instead of assuming the worst case (that the dependency list is outdated if any sources are), rdmd could assume the best case that the dependency list didn't change (but still collect a new dependency list from the compiler output). Best case: it actually didn't change, and the program is rebuilt in one compiler invocation. Worst case: the program fails to link (and we have to suppress the linker output), or something like we give the compiler a stale dependency that was on a network drive that's now timing out. It also probably means that we will need two compiler invocations on any failed compilation, incl. syntax errors, because we can't distinguish a compiler error due to a stale dependency from a compiler error due to there being a mistake in the source code.

// run dmd with an empty file to get it's configuration
rootModule = buildPath(workDir, "emptyFile.d");
rootModule.writeFile("void main(){}");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting :)

Is there an opportunity for code reuse with --main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then I have to modify the lines below. As we have to write an empty file anyways (DMD expects at least one file) , I thought this was the option with the least changes.

Copy link
Contributor Author

@wilzbach wilzbach Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw the additional cost to write the file & call DMD with an empty file seems to be very small:

> python -m timeit -n 10 -r 3 -s 'import os' 'os.system("rdmd --force foo.d > /dev/null")'
10 loops, best of 3: 744 msec per loop
> python -m timeit -n 10 -r 3 -s 'import os' 'os.system("./rdmd --force foo.d > /dev/null")'
10 loops, best of 3: 455 msec per loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the main function because you don't want to link anyway. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the main function because you don't want to link anyway. Right?

Oh nice! I didn't realize dmd would compile an empty file :)

@wilzbach
Copy link
Contributor Author

Hmm, I liked this a little better when it was a small patch.

Well I thought that it might be better to incorporate the full spec, s.t. it's not consider "ugly", but just "hack".

Regardless, the above mentioned breakage is a blocker,

I think I found yet another hack to solve your mentioned problem. On my machine rdmd finds the following dependencies when given only imports from Phobos / Runtime:

["/etc/dmd.conf":"", "/usr/bin/dmd":""]

One way would be to reimplement dinifile.d and find the config file. However that's complicated and a lot of work to maintain (especially with Windows etc.). A quicker solution is to run dmd with an empty file and we will get these dependency information for free (that's what the last amendment does).

and I think slapping on more band-aids e.g. by doubling those checks in rdmd for this edge case would not be a good way forward. I don't think the inconsistencies and extra code it brings justify the gains.

Well to be honest I never thought that this "ugly hack" (it's one) would even be considered, but directly closed.
From my perspective the two hacks that I used are as far as we can go without modifying dmd, if you consider this too hacky, that's a very valid point. I hope you know that I am not angry or resentful if you close this.

rdmd.d Outdated
assert(!"import math = std.math, stdio = std.stdio: writeln, dump = write;".needsInspection);
}

// false positive imports from the third-party libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"false positive" implies a wrong result. So do the unit tests check against false positives/negatives, or do they ensure that the tests correctly return true positives/negatives?

Copy link
Contributor Author

@wilzbach wilzbach Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: As unittest don't allow failures this testblock, the terminology isn't very good as this blocks ensure that false tests are classified correctly whereas the block above checks for all correct cases, so this block should be better just named "third-party import"

@CyberShadow
Copy link
Member

I think the idea to consider programs without external imports semantically identical w.r.t. dependencies with a static program is quite clever. Would be nice to get more eyes on this to see if there are other corner cases, but otherwise I'm in favor of merging this. If we're going all the way, though, I'd also make it check for the mixin keyword. It should already trip (and disable itself) for import(...) expressions, right?

Well I thought that it might be better to incorporate the full spec, s.t. it's not consider "ugly", but just "hack".

Well, it's not very important to worry too much about false positives, since they are semantically harmless, right? So you have to weigh the value of enabling the optimization for rarer cases like renamed imports versus the maintenance burden and chance of bugs in the additional code. Anyway, I'm not saying it should be removed at this point.

@CyberShadow
Copy link
Member

@MartinNowak @s-ludwig Any thoughts?

@wilzbach
Copy link
Contributor Author

wilzbach commented Aug 26, 2016

If we're going all the way, though, I'd also make it check for the mixin keyword. It should already trip (and disable itself) for import(...) expressions, right?

Sounds like a very good idea 👍 . Will add it soon :)

Well, it's not very important to worry too much about false positives, since they are semantically harmless, right? So you have to weigh the value of enabling the optimization for rarer cases like renamed imports versus the maintenance burden and chance of bugs in the additional code. Anyway, I'm not saying it should be removed at this point.

I added it because with the simple patch the following would not trigger a dependency check:

import stdio = my.core;

So I think to be sure to avoid corner cases, the hack should at least detect all possible cases from the spec.

Edit: btw the function itself is short - I just added twice the amount of tests with the intention to lower the maintenance burden

rdmd.d Outdated
else
mod = importText.strip;

if (!(mod.startsWith("std.") || mod.startsWith("core.")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!mod.startsWith("std.", "core."))

rdmd.d Outdated
{
import std.utf : byChar;

foreach (match; file.matchAll(ctRegex!(`import[^a-zA-Z]*(.*);`, "s")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is rather greedy and matches multiple imports at once. For example, "import foo; import bar;" is matched as [["import foo; import bar;", "foo; import bar"]]. As far as I see, the code below expects one import per match. Exclude ';' from the (.*) part.

@andralex
Copy link
Member

Nice trick. There is an alternative that I recall was @CyberShadow's idea (he might have also worked on it and ran into a snag, let us know): apparently it is possible to run dmd in normal compilation mode and to simultaneously output dependencies. Then, for a single-file build the strategy would be to just compile the file to generate both dependencies and the .o file.

If no foreign dependencies found, great. Otherwise, there are two possibilities: hold on to that .o file and compile everything else in another .o file, then link both. Or (simpler with what we currently have), throw away the .o file and build everything in one shot.

Thoughts?

rdmd.d Outdated
{
import std.utf : byChar;

foreach (match; file.matchAll(ctRegex!(`import[^a-zA-Z]*([^;]*);`, "s")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want import to not be preceded by an alphanumeric and always followed by at least one whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want import to not be preceded by an alphanumeric and always followed by at least one whitespace?

After refreshing my regex fuu , I realized we can simply use (?<=^|\s)import\s+( to match the import identifier

@andralex
Copy link
Member

Oh, I realized there is a very simple trick for files with no dependencies:

  1. Collect dependencies just like before, cache them
  2. Next build look at the cached dependencies file.
    2.1. If not empty, do the whole shebang.
    2.2. If empty, compare the date of the cached dependencies file with the date of the source file. If the former is newer, then no need to compute dependencies at all.

The invariant here (which we didn't get in rdmd so far) is that a file with no dependencies that doesn't change continues to have no dependencies. The converse is not true: a file with at least one dependency may change transitive dependencies even if itself doesn't change.

@CyberShadow
Copy link
Member

If empty, compare the date of the cached dependencies file with the date of the source file. If the former is newer, then no need to compute dependencies at all.

Well, doesn't this describe the case where nothing changed and thus no compiler invocations are needed at all?

@andralex
Copy link
Member

@CyberShadow hmmm you're right. If nothing changes we only stat all dependent files (the one passed in the cmdline plus the 0 or more transitive dependencies) but never recompute dependencies, is that right?

@wilzbach
Copy link
Contributor Author

wilzbach commented Aug 26, 2016

@CyberShadow hmmm you're right. If nothing changes we only stat all dependent files (the one passed in the cmdline plus the 0 or more transitive dependencies) but never recompute dependencies, is that right?

I just had a short look at the source code, that's the important bit:

auto deps = readDepsFile();
auto allDeps = chain(rootModule.only, deps.byKey).array;
bool mustRebuildDeps = allDeps.anyNewerThan(depsT);

edit: we don't need the array allocation here -> follow-up PR #192

@andralex
Copy link
Member

Yah look like I was wrong, sorry for the distraction. @CyberShadow please advise about running dmd to produce the .o file and collect dependencies simultaneously. Thx!

@AndrejMitrovic
Copy link
Contributor

IIRC Andrej attempted this a while ago, but hit a snag. Can't find the PR at the moment unfortunately.

@CyberShadow: I think it just ended up being a very stale PR and bitrotted, I'm not sure if there were any blockers though. It might be worth looking into resurrecting it.

The issues & pull:

https://issues.dlang.org/show_bug.cgi?id=9896
dlang/dmd#1861

If I recall correctly the build times were quite faster with this support.

// an import statement can be at the beginning of a line or preceded by
// at least one whitespace
// a whitespace after `import` is mandatory
foreach (match; file.matchAll(ctRegex!(`(?<=^|\s)import\s+([^;]*);`, "ms")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import can also appear after these: ;{}. There may be more. Same with mixin below.

@AndrejMitrovic
Copy link
Contributor

It's a neat hack. I think we could use this now and think about https://issues.dlang.org/show_bug.cgi?id=9896 which is more involved at a later point.

@AndrejMitrovic
Copy link
Contributor

I think we might be missing a simple check. If there's any -I flags on the command-line then there's a very high chance that the module does have 3rd party dependencies.

Speaking of which, you've done benchmarks on single-module builds which have no 3rd party dependencies. But what about those that do? The extra file parsing might slow down build times for all scripts which do have 3rd party deps so it's important that we benchmark this too.

@aG0aep6G
Copy link
Contributor

Nice trick. There is an alternative that I recall was @CyberShadow's idea (he might have also worked on it and ran into a snag, let us know): apparently it is possible to run dmd in normal compilation mode and to simultaneously output dependencies. Then, for a single-file build the strategy would be to just compile the file to generate both dependencies and the .o file.

If no foreign dependencies found, great. Otherwise, there are two possibilities: hold on to that .o file and compile everything else in another .o file, then link both. Or (simpler with what we currently have), throw away the .o file and build everything in one shot.

Thoughts?

I like that. Implemented in #194.

@wilzbach
Copy link
Contributor Author

I like that. Implemented in #194.

We now have two ways to reuse the output of the first compilation (#194, #195).
I think the regex hack is inferior to them and the benchmarks on #195 show the same ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments