Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Local GC] Handle table and the DAC #8322

Closed
swgillespie opened this issue Jun 8, 2017 · 8 comments
Closed

[Local GC] Handle table and the DAC #8322

swgillespie opened this issue Jun 8, 2017 · 8 comments

Comments

@swgillespie
Copy link
Contributor

This is an issue that @adityamandaleeka ran into a little while ago. The DAC has deep knowledge of the GC's handle table:

  • It is aware of the global handle table map instance, g_HandleTableMap
  • It calls GetDependentHandleSecondary on handles in the target process
  • It calls HndGetHandleExtraInfo on handles in the target process

Today, this is not a problem; the GC DAC build compiles the above two functions into the DAC and the DAC is free to call them. For a standalone GC, this is more problematic; as it stands today the DAC is not permitted to dispatch virtually on the target's IGCHeap instance, which it would need to do for these two functions. Even if it could, it would not be calling the DACCESS_COMPILE versions of these functions, which could result in strange behavior.

This commit (swgillespie/coreclr@1eef14c) is a start - it fixes the first bullet point and ifdef's out the bottom two bullet points. We'll need to come up with some sort of strategy GetDependentHandleSecondary and HndGetHandleExtraInfo - we can allow the debugger to dispatch virtually on the debugee's IGCHeap instance, duplicate the code, or do some other solution.

This issue is the last remaining use of the GC DAC build and, once it is complete, the GC DAC build can be removed completely which should be a nice DAC binary size win and compile time improvement.

@swgillespie
Copy link
Contributor Author

swgillespie commented Jun 15, 2017

I had an idea about this today - as long as we link in a GC to the CoreCLR DLL, we do not have to eliminate our dependency on GetDependentHandleSecondary and HndGetHandleExtraInfo, as long as a standalone GC does not modify those functions. In theory, we can use the linked-in DAC-ized versions of these functions, even if the debugee is running with a standalone GC - these two functions should be identical regardless of how the GC is built.

If we do this, we would include these two functions in the DAC API surface area of a standalone GC and would need to increment a major version number if we changed their behavior. That seems to me like it would be a rare occurrence, though.

@jkotas
Copy link
Member

jkotas commented Jun 15, 2017

long as we link in a GC to the CoreCLR DLL

I think you want to make this explicit, like refactor the function into gcinterface.inl. I do think you want to link in a whole GC with a fingers crossed that it will work out.

@swgillespie
Copy link
Contributor Author

Ok! (cc @sergiy-k, who I talked about this with)

To that end, here's the call graph of the two offending functions:

getdependenthandlesecondary_calltree

They refer to no globals, which is very fortunate. I feared that this would become very ugly but the size of this call graph and the fact that they don't refer to global state makes me feel much better.

@swgillespie
Copy link
Contributor Author

swgillespie commented Jun 21, 2017

Compiling with dotnet/coreclr#12389 has revealed two additional violations:

I will update the above call graph with these functions to estimate the damage. In the meantime, I have been prototyping the gcinterface.inl approach that Jan suggested above in this file, using templates to substitute the DAC/GC implementation of the TableSegment and TableSegmentHeader data structures depending on the side of the interface this file is compiled on. If anyone has any comments on this approach I'd be happy to hear it! The file is here: https://github.com/swgillespie/coreclr/blob/handletable-dac-2/src/gc/gcinterface.dac.inl

cc @Maoni0 @sergiy-k @adityamandaleeka if they haven't been pinged already by this thread

@swgillespie
Copy link
Contributor Author

swgillespie commented Jun 21, 2017

Updated call graph:

getdependenthandlesecondary_calltree2

Note that HndFetchHandle's two calls are debug only, otherwise it makes no calls.

@swgillespie
Copy link
Contributor Author

Thanks everyone for the offline feedback so far - unfortunately this is a thorny issue that does not appear to have an appealing solution. To summarize the feedback I've received (and to make sure we're all on the same page, since I've written a lot on this issue):

In the future, a build of the GC (possibly from another repo) may produce two build artifacts: first, a standalone GC linked as shared library that can be loaded by a runtime dynamically, and second a static archive of GC object files so that a runtime can choose to link the GC statically. Some number of source files will also be shipped alongside these two artifacts:

  • gcinterface.h, the definition of the interface that the GC implements and that the runtime consumes,
  • gcinterface.ee.h, the definition of the interface that the GC consumes and that the runtime implements,
  • gcinterface.dac.h, extra definitions that the runtime DAC uses to inspect the GC when debugging

These three source files represent the contract between the runtime and the GC and must be versioned accordingly. gcinterface.h and gcinterface.ee.h are versioned together, since they govern changes to the interface. gcinterface.dac.h is versioned separately, since it depends on implementation details of the GC.

It is desirable to keep the amount of code that we share like this to the absolute minimum, due to the harsh versioning constraints. The rest of this post makes the assumption that it is desirable to have a solution that that enables the GC source to be stored in a separate repro from the runtime, so that these three source files are the only source files required by a runtime to build and link against the GC.

The problem (this issue) is that the DAC uses six functions that are considered to be private to the GC (linked are examples of their usage):

The DAC requires definitions of these functions to link. Today, it gets these definitions from the object files produced by compiling the handle table source files as part of the runtime DAC DACCESS_COMPILE build.

Based on the feedback I've received, there are several possible solutions:

  1. Always link in to the runtime object files produced from a DAC-ized build of the handle table. Based on the above shipping configuration, that would imply that there would need to be a third build artifact produced by a GC build - an archive of objects intended to be linked into the target runtime's DAC, regardless of whether or not it is linking statically against the GC. Furthermore, this would imply that a GC build would need to be capable of producing binaries that the DAC can use - this would require a partial re-implementation of some DAC machinery, in particular the __DPtr template that reads the memory of the debugee and marshals the memory contents into the debugger process.
  2. Provide implementations for the transitive closure of the above six functions through use of a header file. This file will need to be distributed in the same manner as the gcinterface*.h files described in the above shipping configuration. This file will also need to be aggressively versioned in the same manner as gciterface.dac.h.
  3. Do nothing and break the DAC when not using a linked-in GC.

3 is not an acceptable solution, especially if we intend to ship runtimes without linked-in GCs. Both 1 and 2 have major cons.

Cons of 1:

  • It is potentially difficult to reimplement __DPtr since it will involve reimplementing things that the PAL already implements. It requires a handshake with the DAC, since the GC will need to be informed of the OS state required to read process memory (e.g. handle for the debuggee process)
  • In an ideal world the GC should not contain code for reading debuggee memory.
  • It prevents a "truly standalone" GC - there will always need to be some DAC-ized components of the GC linked in to the runtime, regardless of whether or not the GC is being dynamically loaded.
  • It is easy to accidentally introduce new GC/EE interface violations because a lot of symbols that should not be used will be contained in the DAC object file archive GC build artifact.

Cons of 2:

  • The header file that I prototyped (earlier in the thread) is ugly and not elegant.
  • The more source that is shared between the GC and the runtime, the more difficult it is to be correct when versioning.
  • The entire transitive closure of the six offending functions must be placed in this header file, which is potentially a lot of code.

Another major con shared by both approaches is that this is potentially a lot of effort expended on a small portion of code.

@swgillespie
Copy link
Contributor Author

Another note is that HndEnumHandles has a complex call graph that touches many details of the handle table and thus is potentially difficult to include in the header approach.

@swgillespie
Copy link
Contributor Author

Fixed by dotnet/coreclr#12458.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants