Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm_space (boot image space) refactoring #415

Closed
qinsoon opened this issue Aug 17, 2021 · 5 comments
Closed

vm_space (boot image space) refactoring #415

qinsoon opened this issue Aug 17, 2021 · 5 comments
Labels
A-heap Area: Heap (including Mmapper, VMMap) C-cleanup Category: Cleanup F-question Call For Participation: Unanswered question (need more information)

Comments

@qinsoon
Copy link
Member

qinsoon commented Aug 17, 2021

We currently hard coded a size for VM space, and make the assumption that VM space is at the beginning of our heap range. This is basically the setting from JikesRVM. To make it more general, I think we should allow the binding to specify the start and the size of VM space. We can still make the assumption that the VM space should be within the range of [HEAP_START, HEAP_END).

@qinsoon qinsoon added C-cleanup Category: Cleanup A-heap Area: Heap (including Mmapper, VMMap) labels Aug 17, 2021
@qinsoon
Copy link
Member Author

qinsoon commented Aug 18, 2022

With recent changes in #625 and #629, we should be able to allow a binding to specify VM space range through options.

There are a few questions we need to figure out before doing the change:

  • Can vm space be discountiguous? I think we need to allow it.
  • I assume VM space needs to in the address range we use. Does VM space need to be in the heap range?
  • What is the semantics of VM space? Do we handle the tracing of objects in vm space (including metadata for the objects)?
  • Do we still need vm space, given that we have vm_trace_object() now.

@qinsoon qinsoon added the F-question Call For Participation: Unanswered question (need more information) label Aug 18, 2022
@qinsoon
Copy link
Member Author

qinsoon commented Aug 19, 2022

We discussed about this topic.

The original idea and what the current code reflects is: we have a specific space, called VM space. It currently uses an ImmortalSpace, but it should be a special space whose semantic is defined by the VM. It is allocated and managed by the VM. We would want the binding to tell us its address range, so we can do dispatching. But other than dispatching, everything is up to the VM, including how to trace object in the space, liveness, movable, etc. For example, MMTk does dispatching, and the special VM space will call to the binding for its behaviours.

A maybe simpler and cleaner idea is that MMTk only needs to know its own space. Anything that is not in MMTk's space could be a VM-allocated object, a VM-managed pointer, or an invalid reference. So whenever we encounter an address that is not in MMTk's spaces, we will just call the binding -- the binding can then decide whether it is a known address/object for the binding, or it is a rogue/invalid pointer. The current vm_trace_object() reflects this design, and we may need more similar methods like vm_trace_object() when we encounter unknown references in different scenarios.

Though MMTk only knows its own space, we may allow VMs to create spaces in MMTk. They can implement their own policy
and semantics, or reuse MMTk's policy, to create spaces. This means we can still do dispatching for objects in those spaces and the semantics is defined by the VM. This is similar to the original idea, except that we expose a way to create spaces for the bindings, rather than exposing a specific VM space. We will need to deal with discontiguous spaces in 64 bits. For VM managed spaces, their address range should not conflict with our heap range.

@qinsoon
Copy link
Member Author

qinsoon commented May 5, 2023

A maybe simpler and cleaner idea is that MMTk only needs to know its own space. Anything that is not in MMTk's space could be a VM-allocated object, a VM-managed pointer, or an invalid reference. So whenever we encounter an address that is not in MMTk's spaces, we will just call the binding -- the binding can then decide whether it is a known address/object for the binding, or it is a rogue/invalid pointer. The current vm_trace_object() reflects this design, and we may need more similar methods like vm_trace_object() when we encounter unknown references in different scenarios.

The issue for this is the object metadata. If the VM uses side metadata, we would need to be aware of the VM space so we can mmap the side metadata for the region. Although most of the metadata is used during GC (in trace_object), there are exceptions, such as the log bit used by write barriers.

Though MMTk only knows its own space, we may allow VMs to create spaces in MMTk. They can implement their own policy and semantics, or reuse MMTk's policy, to create spaces. This means we can still do dispatching for objects in those spaces and the semantics is defined by the VM. This is similar to the original idea, except that we expose a way to create spaces for the bindings, rather than exposing a specific VM space. We will need to deal with discontiguous spaces in 64 bits. For VM managed spaces, their address range should not conflict with our heap range.

Implementing a space usually can reuse many internal types, like CommonSpace, PageResource. Those are internal types and not public to the users. In this case, the users would need to implement the VM space without reusing any MMTk internal types, which sounds like a major task. One way to solve this is that we can implement most of the VMSpace inside MMTk, and only forward certain calls to the bindings.

@qinsoon
Copy link
Member Author

qinsoon commented May 5, 2023

#802 allows the runtime to specify the start and the size of a VM space, and allows the runtime to specify those after MMTk is initialized.

The PR assumes the VM space range is outside the heap range we use for our internal spaces (AVAILABLE_START, and AVAILABLE_END). This makes things easier.

One leftover issue for the PR is that we need a way to tell if an object is in MMTk's heap (in internal spaces or VM space). In Java MMTk, based on the fact that the VM space is next to the internal spaces, a bound check is possible -- any object between HEAP_START and HEAP_END is in MMTk heap. With MMTk core, as a runtime can specify any address range as the VM space, we cannot use bound check any more. We could use SFT or VMMap. I haven't checked if this works.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 26, 2023

#864 further allows discontiguous VM space.

One leftover issue for the PR is that we need a way to tell if an object is in MMTk's heap (in internal spaces or VM space). In Java MMTk, based on the fact that the VM space is next to the internal spaces, a bound check is possible -- any object between HEAP_START and HEAP_END is in MMTk heap. With MMTk core, as a runtime can specify any address range as the VM space, we cannot use bound check any more. We could use SFT or VMMap. I haven't checked if this works.

is_in_mmtk_spaces() can tell if an object is in MMTK's heap (including VM space).

This issue can be closed now.

@qinsoon qinsoon closed this as completed Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-heap Area: Heap (including Mmapper, VMMap) C-cleanup Category: Cleanup F-question Call For Participation: Unanswered question (need more information)
Projects
None yet
Development

No branches or pull requests

1 participant