-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode codemap and span information in crate metadata. #22235
Conversation
Do you know if it would be easy enough to track which spans are associated with generic functions and which aren't? It'd be nice to just cut down on the metadata size slightly (4mb for libcore is pretty big) |
I'll see what I can do to omit span data for types and other stuff that is not used later anyway. I'll also experiment with encoding spans in LEB128 and see if that helps in addition to the LZ77 encoding we are doing anyway. I'll be on vacation for the rest of the week, however, so please don't expect any updates before next week. |
Needs a rebase, sorry. |
So, I've been investigating different options of reducing space usage for spans but I've not been really successful.
A maybe acceptable intermediate solution is to store spans as one The real problem though is how metadata is encoded in general (an issue already recorded in #21482). It's super-verbose and I doubt that all this tagging really has much of a benefit. Coming up with a better data format will be necessary sooner or later. |
Wow that is... depressing. It's not necessarily the end of the world to inflate metadata size right now, it's just something to think about :). If you've got a smaller format working though, that sounds great! |
That's one way to see it. I see the current metadata encoding as a lustrous garden of low-hanging fruit, overripe and ready for the picking. How often does one get the chance to shave off a third of the space consumption at no cost at all. We'll have to pin down the requirements for the metadata format though before the feast can start. |
15d0206
to
855fe9f
Compare
OK, this is updated now. Numbers are better than before.
|
Ok, this is all looking good to me, nice work @michaelwoerister! I'd like a second set of eyes on this, however. |
I need to give this a proper read, but I think the idea is good. One question: what about macro expansions? I guess we don't use them in debug info? Is it ever possible we'd want to have this information cross-crate? |
Yes, macro expansion information is not used in debuginfo. DWARF has some provisions for recording C-macro definitions (not expansions) but that's not even supported by LLVM. I guess this won't be a topic for us in the foreseeable future. About having this information cross-crate in general: It would be easy to implement, I think, but it costs additional space and at the moment I don't know of a use case. For now, I'd just leave it out. |
} | ||
} | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary return
r+ with the nits addressed |
Thanks for the review. I work in your comments on Wednesday. Don't worry about nittiness |
The space optimization for filemap may have to be revisited after #22971 lands. The relevant commit is fe73d38, which makes the tag represent the size of following integers. If you keep individual tags in the filemap then you don't need separate |
☔ The latest upstream changes (presumably #22971) made this pull request unmergeable. Please resolve the merge conflicts. |
This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, line-beginnings, etc. but not the actual source code itself. We are thus missing the opportunity of making Rust the first "open-source-only" programming language out there. Pity.
855fe9f
to
2f88655
Compare
I think I've addressed all comments. The encoding of the line data in filemaps is not optimal yet. I started implementing a better version using |
@bors r=nrc |
…oerister This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself. Crate metadata size is increased by this change because spans in the encoded ASTs take up space now: ``` BEFORE AFTER libcore 36 MiB 39.6 MiB +10% libsyntax 51.1 MiB 60.5 MiB +18.4% libcollections 11.2 MiB 12.8 MiB +14.3% ``` This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size. Fixes #19228 and probably #22226.
This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.
Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:
This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size.
Fixes #19228 and probably #22226.