Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use fully qualified class/interface name for TypeInfo_Class.name & mangled name for TypeInfo_Struct.name #11450

Closed
wants to merge 2 commits into from

Conversation

kinke
Copy link
Contributor

@kinke kinke commented Jul 24, 2020

No description provided.

@kinke
Copy link
Contributor Author

kinke commented Jul 24, 2020

[Just checking how bad the breakage would be.]

@rainers
Copy link
Member

rainers commented Jul 24, 2020

I don't expect much breakage.

While I agree this is the proper fix, it can result in huge strings for pretty rare use cases. We have avoided emitting similar long strings with the back references in the mangling.

To be pedantic, the same change should be made for TypeInfo_Interface and TypieInfo_Struct (the latter would probably expose the long string issue). Maybe we could switch to using the mangled name, but that would really be a breaking change.

@kinke
Copy link
Contributor Author

kinke commented Jul 24, 2020

I don't expect much breakage.

I expected at least some, but it's looking pretty good.

Maybe we could switch to using the mangled name, but that would really be a breaking change.

I'm not too fond of long strings either. If switching to the mangled name, we'd definitely need to demangle it for the exception/backtrace string.

@rmanthorpe
Copy link
Contributor

Could you leave name as it is and store the mangle too for equality comparison?

@kinke
Copy link
Contributor Author

kinke commented Jul 24, 2020

That's what I've done in ldc-developers/ldc#3517 (restricted to MSVC targets and Throwables only for now).

@rainers
Copy link
Member

rainers commented Jul 25, 2020

I have built the phobos unittest.exe with a few variations for Win64:

  • master: 46_856_192 bytes
  • +fqn TypeInfo_Class: 46_871_552 bytes
  • +fqn TypeInfo_Interface: 46_872_064 bytes
  • +fqn TypeInfo_Struct: 47_119_872 bytes

Given that the name in struct typeinfo is not used by the runtime for dynamic typing as for classes and interfaces, and assuming that long chains of template instantiations are less likely for the latter two, I'd be fine with using toPrettyChars(true) for classes and interfaces.

@rmanthorpe
Copy link
Contributor

rmanthorpe commented Jul 25, 2020

Given that the name in struct typeinfo is not used by the runtime for dynamic typing

It is used for things like Variant though. I think you need to do them all, otherwise things will continue to broken in some cases. How does this compare to storing a mangle as well or storing the mangle and demangling on demand?

@kinke
Copy link
Contributor Author

kinke commented Jul 25, 2020

Thx for the numbers, Rainer. I expected a bigger overhead, as we've seen old mangled names > 64k before you've implemented the backrefs (and I expect the FQN to be only slightly shorter than those old mangles).

It is used for things like Variant though.

Grepping for .name in std.variant yields no results.

@rmanthorpe
Copy link
Contributor

rmanthorpe commented Jul 25, 2020

Grepping for .name in std.variant yields no results.

std.variant uses TypeInfo to work out what type it has internally. TypeInfo.opEquals falls back to comparing names as a (broken until dlang/druntime#2874 + this) fix for the Windows multiple TypeInfos from different DLLs problem.

I appreciate it's not going to include the worst case scenarios but in the phobos unittests above adding struct FQN only adds 0.5% to the size of the binary - that's not bad. If it's worst case size that matters most then storing the mangle alongside the name as you're already doing on Windows in LDC is going to win eventually, at the cost of slightly inflated binaries for "normal" cases. To me the mangle makes sense as the thing to compare since it is designed to be a unique identifier in the first place.

@kinke
Copy link
Contributor Author

kinke commented Aug 15, 2020

So what is it to be then? My first choice would be storing the mangle only and demangling on-the-fly on demand (e.g., when printing an exception), but as Object.toString() defaults to typeid(this).name, that's not really an option IMO. Storing the almost-fully-qualified name and the mangle is probably not much better or even worse, size-wise, than storing the FQN alone, which is IMO also nicer for user-facing messages, so I tend to the latter.

Still using the almost-fully-qualified name for structs would IMO be inconsistent and, as pointed out, be a potential problem for std.variant etc.
Edit: Apparently there are almost no usages of TypeInfo_Struct.name in druntime/Phobos, so maybe we could switch to the mangled name (only) for structs, which should mainly only affect TypeInfo_Struct.toString() (and where we could potentially demangle it to be halfway backwards-compatible).

I.e., also fully qualify template arguments, to make these names truly
unique per class/interface declaration.
To make them truly unique while preventing possibly very long human-
readable fully-qualified names.

`TypeInfo_Struct.toString()` returns this string; this change is
expected to be the main user-visible effect.
@kinke
Copy link
Contributor Author

kinke commented Aug 15, 2020

Okay, implemented as proposed above, and it looks like we could get away without demangling in TypeInfo_Struct.toString(), as all Buildkite projects except druntime pass.

@kinke kinke changed the title [WIP] TypeInfo_Class.name: Use fully qualified class name [WIP] Use fully qualified class/interface name for TypeInfo_Class.name & mangled name for TypeInfo_Struct.name Aug 15, 2020
@rmanthorpe
Copy link
Contributor

I like this approach, although I do think druntime to should be updated to demangle toString. The value of toString might not form any part of any tests in buildkite but it is probably going to garble someone's output and cause confusion.

import dmd.root.outbuffer : OutBuffer;
import dmd.dmangle : mangleToBuffer;
OutBuffer nameBuf;
mangleToBuffer(sd, &nameBuf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the mangled name already available in tc.deco?

@rainers
Copy link
Member

rainers commented Aug 16, 2020

I like this approach, although I do think druntime to should be updated to demangle toString

I agree. It might even make sense to rename the name field of TypeInfo_Struct to mangledName and add a property function name() that does the demangling. Not sure how much effort has to be put into making it efficient (e.g. by avoiding multiple demangling of the same string by using some cache), as there are attempt at converting TypeInfo into a ibrary defined template anyway.

@kinke
Copy link
Contributor Author

kinke commented Aug 16, 2020

Yeah, I wouldn't be too fond of an extra slice as cache inside TypeInfo_Struct - it would also prevent garbage collection if it's only used once. The GC required for demangling is actually my main concern - e.g., the druntime integration tests fail due to -profile=gc using TypeInfo.toString(). When straight-forwardly demangling in TypeInfo_Struct.toString(), the GC stats include GC actions for the -profile=gc output containing demangled struct names (edit: and there, we might demangle the same names over and over)...

@rainers
Copy link
Member

rainers commented Aug 16, 2020

I was rather thinking of a global AA string[TypeInfo], which could also use one of the manually memory managing containers in druntime. When given a sufficently large buffer, the GC will only kick in when demangling rather long names. I don't think the garbage is too bad, if the alternative is to generate the demangled name into the executable unconditionally.

@kinke
Copy link
Contributor Author

kinke commented Aug 16, 2020

Hmm, a global AA would probably come with sync overhead for thread-safety...

I don't think the garbage is too bad, if the alternative is to generate the demangled name into the executable unconditionally.

Yeah right.

@rainers
Copy link
Member

rainers commented Aug 17, 2020

Hmm, a global AA would probably come with sync overhead for thread-safety...

Could be thread-local, too. But if you are allocating, you are using possibly blocking operations anyway.

@kinke
Copy link
Contributor Author

kinke commented Aug 19, 2020

I rather tend to a slice in TypeInfo_Struct then, that's simple and fast to check/set.

But how is one supposed to overcome the compiler/druntime coupling in this case here upstream without a druntime git submodule? I really don't want to prepare druntime for an upcoming compiler change, supporting both old and new TypeInfo_Struct.name, then merge this and later adapt druntime again, throwing away the old stuff.
Edit: Well with an extra slice in TypeInfo_Struct, I'd probably have to adapt the compiler first (supporting both TypeInfo_Struct layouts), then druntime, then the compiler again and then druntime again - erm, no thanks, way simpler for LDC.

@kinke
Copy link
Contributor Author

kinke commented Jul 27, 2021

Superseded by #12928.

@kinke kinke closed this Jul 27, 2021
@kinke kinke deleted the patch-1 branch July 27, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants