Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode codemap and span information in crate metadata. #22235

Merged
merged 1 commit into from
Mar 4, 2015

Conversation

michaelwoerister
Copy link
Member

This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.

Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:

                BEFORE    AFTER
libcore         36 MiB    39.6 MiB    +10%
libsyntax       51.1 MiB  60.5 MiB    +18.4%
libcollections  11.2 MiB  12.8 MiB    +14.3%

This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size.

Fixes #19228 and probably #22226.

@alexcrichton
Copy link
Member

Do you know if it would be easy enough to track which spans are associated with generic functions and which aren't? It'd be nice to just cut down on the metadata size slightly (4mb for libcore is pretty big)

@michaelwoerister
Copy link
Member Author

I'll see what I can do to omit span data for types and other stuff that is not used later anyway. I'll also experiment with encoding spans in LEB128 and see if that helps in addition to the LZ77 encoding we are doing anyway.

I'll be on vacation for the rest of the week, however, so please don't expect any updates before next week.

@Manishearth
Copy link
Member

Needs a rebase, sorry.

@michaelwoerister
Copy link
Member Author

So, I've been investigating different options of reducing space usage for spans but I've not been really successful.

  • At the moment it's not possible to omit some spans from metadata but not others. The reason is that spans occur in the AST and we use derive-generated Encode/Decode implementations for these data structures. If we had a #[dont_serialize_me] attributes for fields, we could do something here. However, I didn't want to add this feature in a rudimentary, ad-hoc fashion, just for this one use case.
  • I made the compiler store spans in the variable-sized LEB128 format also used in DWARF. This however didn't give the expected space reduction because our RBML writer is super-wasteful, tagging each u8 I emit with a type-tag and a byte storing the size of the number. This results in at least 3 bytes and up to 12 bytes per BytePos and between 6 and 24 bytes per Span which totally defeats the purpose of storing things in LEB128.

A maybe acceptable intermediate solution is to store spans as one u64 in order to cut down on RBML tagging overhead. In a preliminary test this worked quite well under the circumstances. I'll push a cleaned up version of this the day after tomorrow.

The real problem though is how metadata is encoded in general (an issue already recorded in #21482). It's super-verbose and I doubt that all this tagging really has much of a benefit. Coming up with a better data format will be necessary sooner or later.

@alexcrichton
Copy link
Member

Wow that is... depressing. It's not necessarily the end of the world to inflate metadata size right now, it's just something to think about :). If you've got a smaller format working though, that sounds great!

@michaelwoerister
Copy link
Member Author

Wow that is... depressing.

That's one way to see it. I see the current metadata encoding as a lustrous garden of low-hanging fruit, overripe and ready for the picking. How often does one get the chance to shave off a third of the space consumption at no cost at all.

We'll have to pin down the requirements for the metadata format though before the feast can start.

@michaelwoerister michaelwoerister force-pushed the cross-crate-spans branch 6 times, most recently from 15d0206 to 855fe9f Compare February 26, 2015 13:24
@michaelwoerister
Copy link
Member Author

OK, this is updated now. Numbers are better than before.

                BEFORE    AFTER
libcore         39   MiB  41   MiB    +5%
libsyntax       54.6 MiB  61.2 MiB    +12%
libcollections  14.3 MiB  15.3 MiB    +7%

@alexcrichton
Copy link
Member

Ok, this is all looking good to me, nice work @michaelwoerister!

I'd like a second set of eyes on this, however.

r? @nrc or @brson

@nrc
Copy link
Member

nrc commented Feb 27, 2015

I need to give this a proper read, but I think the idea is good.

One question: what about macro expansions? I guess we don't use them in debug info? Is it ever possible we'd want to have this information cross-crate?

@michaelwoerister
Copy link
Member Author

Yes, macro expansion information is not used in debuginfo. DWARF has some provisions for recording C-macro definitions (not expansions) but that's not even supported by LLVM. I guess this won't be a topic for us in the foreseeable future.

About having this information cross-crate in general: It would be easy to implement, I think, but it costs additional space and at the moment I don't know of a use case. For now, I'd just leave it out.

}
}

return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary return

@nrc
Copy link
Member

nrc commented Mar 1, 2015

r+ with the nits addressed

@michaelwoerister
Copy link
Member Author

Thanks for the review. I work in your comments on Wednesday. Don't worry about nittiness :)

@lifthrasiir
Copy link
Contributor

The space optimization for filemap may have to be revisited after #22971 lands. The relevant commit is fe73d38, which makes the tag represent the size of following integers. If you keep individual tags in the filemap then you don't need separate bytes_per_diff; otherwise you will need Opaque to elide tags at all.

@bors
Copy link
Contributor

bors commented Mar 3, 2015

☔ The latest upstream changes (presumably #22971) made this pull request unmergeable. Please resolve the merge conflicts.

This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics).
Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, line-beginnings, etc. but not the actual source code itself. We are thus missing the opportunity of making Rust the first "open-source-only" programming language out there. Pity.
@michaelwoerister
Copy link
Member Author

I think I've addressed all comments. The encoding of the line data in filemaps is not optimal yet. I started implementing a better version using emit_opaque() as @lifthrasiir suggested but it got so ugly and unwieldy that I decided to do this in a separate PR later.

@michaelwoerister
Copy link
Member Author

@bors r=nrc

@bors
Copy link
Contributor

bors commented Mar 4, 2015

@bors r=michaelwoerister 2f88655

@bors
Copy link
Contributor

bors commented Mar 4, 2015

⌛ Testing commit 2f88655 with merge 3b3bb0e...

bors added a commit that referenced this pull request Mar 4, 2015
…oerister

This allows to create proper debuginfo line information for items inlined from other crates (e.g. instantiations of generics). Only the codemap's 'metadata' is stored in a crate's metadata. That is, just filename, positions of line-beginnings, etc. but not the actual source code itself.

Crate metadata size is increased by this change because spans in the encoded ASTs take up space now:
```
                BEFORE    AFTER
libcore         36 MiB    39.6 MiB    +10%
libsyntax       51.1 MiB  60.5 MiB    +18.4%
libcollections  11.2 MiB  12.8 MiB    +14.3%
```
This only affects binaries containing metadata (rlibs and dylibs), executables should not be affected in size. 

Fixes #19228 and probably #22226.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could not debug into a generic method defined in an external crate
6 participants