Fix issue 16431 - Speed-up rdmd for single file builds#191
Fix issue 16431 - Speed-up rdmd for single file builds#191wilzbach wants to merge 1 commit intodlang:masterfrom
Conversation
|
|
@wilzbach, thanks for your PR! By analyzing the annotation information on this pull request, we identified @CyberShadow, @andralex and @leandro-lucarella-sociomantic to be potential reviewers. @CyberShadow: The PR was automatically assigned to you, please reassign it if you were identified mistakenly. |
|
Well, this is one ugly hack :) dub cheats in that it does not do any dependency discovery. It simply compiles everything under The correct solution would be to move |
rdmd.d
Outdated
| { | ||
| foreach (mod; match[1].splitter(",")) | ||
| { | ||
| if (!(mod.startsWith("std") || mod.startsWith("std"))) |
There was a problem hiding this comment.
I'm guessing you meant to write if (!(mod.startsWith("std") || mod.startsWith("core")))
|
Anyway, from a first impression this seems rather harmless... false positives simply bring back compilation as it was, and false negatives seem unlikely, I think - it can be made to break with shenanigans like |
rdmd.d
Outdated
| // In most cases we don't have any dependencies except the standard library | ||
| auto file = readText(rootModule); | ||
| bool needInspection = false; | ||
| outer: foreach (match; file.matchAll(ctRegex!`import\s+(.*);`)) |
There was a problem hiding this comment.
import name = std.string; // false positive ;)|
Did you check if the test suite passes? Currently it's not covered by our CI. |
| { | ||
| string[string] noDeps; | ||
| return noDeps; | ||
| } |
Well, it fails even on master for me, so maybe we should enable something simple as Travis here ;-) |
|
Actually, there is a problem: the discovered dependencies don't just consist of the imports - but also the compiler and library file. Currently, if the compiler is upgraded, a second rdmd run will rebuild the program, but this breaks that contract. |
rdmd.d
Outdated
| string[string] noDeps; | ||
| return noDeps; | ||
| } | ||
| return null; |
There was a problem hiding this comment.
To clarify, the added and removed code is identical in functionality. An associative array is initialized to null by default.
|
Hmm, I liked this a little better when it was a small patch. Regardless, the above mentioned breakage is a blocker, and I think slapping on more band-aids e.g. by doubling those checks in rdmd for this edge case would not be a good way forward. I don't think the inconsistencies and extra code it brings justify the gains. The travis configuration should probably be in its own PR, and it should probably run the rdmd test suite, which could then run rdmd unit tests if so needed (so users don't have to test both separately). |
|
Perhaps there is an opportunity for optimization for successive runs, i.e. when the source has changed. Instead of assuming the worst case (that the dependency list is outdated if any sources are), rdmd could assume the best case that the dependency list didn't change (but still collect a new dependency list from the compiler output). Best case: it actually didn't change, and the program is rebuilt in one compiler invocation. Worst case: the program fails to link (and we have to suppress the linker output), or something like we give the compiler a stale dependency that was on a network drive that's now timing out. It also probably means that we will need two compiler invocations on any failed compilation, incl. syntax errors, because we can't distinguish a compiler error due to a stale dependency from a compiler error due to there being a mistake in the source code. |
| // run dmd with an empty file to get it's configuration | ||
| rootModule = buildPath(workDir, "emptyFile.d"); | ||
| rootModule.writeFile("void main(){}"); | ||
| } |
There was a problem hiding this comment.
Interesting :)
Is there an opportunity for code reuse with --main?
There was a problem hiding this comment.
Well then I have to modify the lines below. As we have to write an empty file anyways (DMD expects at least one file) , I thought this was the option with the least changes.
There was a problem hiding this comment.
Btw the additional cost to write the file & call DMD with an empty file seems to be very small:
> python -m timeit -n 10 -r 3 -s 'import os' 'os.system("rdmd --force foo.d > /dev/null")'
10 loops, best of 3: 744 msec per loop
> python -m timeit -n 10 -r 3 -s 'import os' 'os.system("./rdmd --force foo.d > /dev/null")'
10 loops, best of 3: 455 msec per loop
There was a problem hiding this comment.
You don't need the main function because you don't want to link anyway. Right?
There was a problem hiding this comment.
You don't need the main function because you don't want to link anyway. Right?
Oh nice! I didn't realize dmd would compile an empty file :)
Well I thought that it might be better to incorporate the full spec, s.t. it's not consider "ugly", but just "hack".
I think I found yet another hack to solve your mentioned problem. On my machine rdmd finds the following dependencies when given only imports from Phobos / Runtime: One way would be to reimplement
Well to be honest I never thought that this "ugly hack" (it's one) would even be considered, but directly closed. |
rdmd.d
Outdated
| assert(!"import math = std.math, stdio = std.stdio: writeln, dump = write;".needsInspection); | ||
| } | ||
|
|
||
| // false positive imports from the third-party libraries |
There was a problem hiding this comment.
"false positive" implies a wrong result. So do the unit tests check against false positives/negatives, or do they ensure that the tests correctly return true positives/negatives?
There was a problem hiding this comment.
edit: As unittest don't allow failures this testblock, the terminology isn't very good as this blocks ensure that false tests are classified correctly whereas the block above checks for all correct cases, so this block should be better just named "third-party import"
|
I think the idea to consider programs without external imports semantically identical w.r.t. dependencies with a static program is quite clever. Would be nice to get more eyes on this to see if there are other corner cases, but otherwise I'm in favor of merging this. If we're going all the way, though, I'd also make it check for the
Well, it's not very important to worry too much about false positives, since they are semantically harmless, right? So you have to weigh the value of enabling the optimization for rarer cases like renamed imports versus the maintenance burden and chance of bugs in the additional code. Anyway, I'm not saying it should be removed at this point. |
|
@MartinNowak @s-ludwig Any thoughts? |
Sounds like a very good idea 👍 . Will add it soon :)
I added it because with the simple patch the following would not trigger a dependency check: import stdio = my.core;So I think to be sure to avoid corner cases, the hack should at least detect all possible cases from the spec. Edit: btw the function itself is short - I just added twice the amount of tests with the intention to lower the maintenance burden |
rdmd.d
Outdated
| else | ||
| mod = importText.strip; | ||
|
|
||
| if (!(mod.startsWith("std.") || mod.startsWith("core."))) |
There was a problem hiding this comment.
if (!mod.startsWith("std.", "core."))
rdmd.d
Outdated
| { | ||
| import std.utf : byChar; | ||
|
|
||
| foreach (match; file.matchAll(ctRegex!(`import[^a-zA-Z]*(.*);`, "s"))) |
There was a problem hiding this comment.
This regex is rather greedy and matches multiple imports at once. For example, "import foo; import bar;" is matched as [["import foo; import bar;", "foo; import bar"]]. As far as I see, the code below expects one import per match. Exclude ';' from the (.*) part.
|
Nice trick. There is an alternative that I recall was @CyberShadow's idea (he might have also worked on it and ran into a snag, let us know): apparently it is possible to run dmd in normal compilation mode and to simultaneously output dependencies. Then, for a single-file build the strategy would be to just compile the file to generate both dependencies and the If no foreign dependencies found, great. Otherwise, there are two possibilities: hold on to that Thoughts? |
rdmd.d
Outdated
| { | ||
| import std.utf : byChar; | ||
|
|
||
| foreach (match; file.matchAll(ctRegex!(`import[^a-zA-Z]*([^;]*);`, "s"))) |
There was a problem hiding this comment.
Don't we want import to not be preceded by an alphanumeric and always followed by at least one whitespace?
There was a problem hiding this comment.
Don't we want import to not be preceded by an alphanumeric and always followed by at least one whitespace?
After refreshing my regex fuu , I realized we can simply use (?<=^|\s)import\s+( to match the import identifier
|
Oh, I realized there is a very simple trick for files with no dependencies:
The invariant here (which we didn't get in rdmd so far) is that a file with no dependencies that doesn't change continues to have no dependencies. The converse is not true: a file with at least one dependency may change transitive dependencies even if itself doesn't change. |
Well, doesn't this describe the case where nothing changed and thus no compiler invocations are needed at all? |
|
@CyberShadow hmmm you're right. If nothing changes we only stat all dependent files (the one passed in the cmdline plus the 0 or more transitive dependencies) but never recompute dependencies, is that right? |
I just had a short look at the source code, that's the important bit: auto deps = readDepsFile();
auto allDeps = chain(rootModule.only, deps.byKey).array;
bool mustRebuildDeps = allDeps.anyNewerThan(depsT);edit: we don't need the array allocation here -> follow-up PR #192 |
|
Yah look like I was wrong, sorry for the distraction. @CyberShadow please advise about running dmd to produce the .o file and collect dependencies simultaneously. Thx! |
@CyberShadow: I think it just ended up being a very stale PR and bitrotted, I'm not sure if there were any blockers though. It might be worth looking into resurrecting it. The issues & pull: https://issues.dlang.org/show_bug.cgi?id=9896 If I recall correctly the build times were quite faster with this support. |
28e1eb1 to
296a25e
Compare
| // an import statement can be at the beginning of a line or preceded by | ||
| // at least one whitespace | ||
| // a whitespace after `import` is mandatory | ||
| foreach (match; file.matchAll(ctRegex!(`(?<=^|\s)import\s+([^;]*);`, "ms"))) |
There was a problem hiding this comment.
import can also appear after these: ;{}. There may be more. Same with mixin below.
|
It's a neat hack. I think we could use this now and think about https://issues.dlang.org/show_bug.cgi?id=9896 which is more involved at a later point. |
|
I think we might be missing a simple check. If there's any Speaking of which, you've done benchmarks on single-module builds which have no 3rd party dependencies. But what about those that do? The extra file parsing might slow down build times for all scripts which do have 3rd party deps so it's important that we benchmark this too. |
I like that. Implemented in #194. |
Hey there,
while looking a bit at the run time of the DLang Tour builds I discovered that using single-file DUB builds is faster than rdmd. I digged a bit into this and the problem is that
rdmdrunsdmdfirst to get a list of all imports, however this comes at a huge cost (~40% of total runtime).I use
rdmda lot every day (e.g. every time I run tests) and I expect other to do so too, hence if we can agree on a conservative estimate that covers most cases we could achieve quite a huge benefit. I know of course one could rundmd <file> && ./<file>, but many of our tutorials advise to userdmdand it's convenient not to have yet another alias for single file builds.I think the tricky part here is to make a conservative guess without false-positives. I don't think my initial idea is perfect, but maybe someone else has a better idea?
Benchmarks
In any case here the results when benchmarked with this file - with other files one gets similar results.
Currently
With this PR
For reference, DUB: