-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental -hash-threshold option to hash very long symbol names. #1445
Conversation
Correction: it is ~5322 bytes large |
I was just confused by this, maybe you don't mind the extra 4 chars and make it |
Hm, reading through our other options, I guess |
ping @klickverbot @redstar |
Why didn't you implement it in D? That way, there would be a chance of submitting it to upstream… ;) |
name = namebuf.ptr; | ||
sprintf(name, "_D%lluTypeInfo_%.*s6__initZ", cast(ulong)9 + hashedname.length, hashedname.length, hashedname.ptr); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems rather unfortunate. Perhaps we should ditch IN_LLVM for such cases, or replace it by an enum set from the version (so you can do if (IN_LLVM && …
). It doesn't seem like we would ever want to try using LDC's front end sources to build DMD…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. We already have the enum IN_LLVM, so I'll use that. I also think the copying is ugly/stupid.
I will indent the DDMD source, so that we are notified of (perhaps relevant) changes by merge errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd probably keep the indentation as it is, so that the diff is kept tidy and the LDC-specific part is made painfully obvious when browsing the source. But I guess one could always use diff -w
for the former…
Many (aggravating) reasons for it. |
If the switch is marked as experimental and it is made clear that it might disappear/behave differently in the future (release notes, …), it should probably be fine to add it as-is. I would very much hope that in the long term this remains a quick band-aid fix for Weka, though, until we get a proper upstream solution. |
Merging when green on testers. Ideas for future work:
|
Would it make sense to use DMD's approach of only hashing the names before emitting them to object files? This way, the Phobos code relying on |
Hm? |
Ah, sorry, I had misremembered that (and the location of the diff in mtype.d). |
This adds MD5 hashing of symbol names that are larger than threshold set by
-hashthres
.What is very unfortunate is that std.traits depends on the mangled name, doing string parsing of the mangled name of symbols to obtain symbol traits. This means that mangling cannot be changed (dramatically, like hashing) at a high level, and the hashing has to be done on a lower level.
Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle
gives:one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module
one.two.three
on line 34. The identifier isfoo
and is contained in the struct or classResult
.Symbols that may be hashed:
The feature is experimental, and has been tested on Weka.io's codebase. Compilation with
-hashthres=1000
results in a binary that is half the size of the original (201MB vs. 461MB). I did not observe a significant difference in total build times. Hash threshold of 8000 gives 229MB, 800 gives 195MB binary size: there is not much gain after a certain hash threshold.Linking Weka's code fails with a threshold of 500: phobos contains a few large symbols (one larger than 8kb!) and this PR currently does not disable hashing of symbols that are inside phobos, hence "experimental". Future work could try to figure out whether a symbol is inside phobos or not.