Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require object reference to be aligned #1159

Merged
merged 8 commits into from
Jul 11, 2024
Merged

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Jun 27, 2024

This PR adds some assumptions about object reference, and the conversion between object reference and in-object address. To be specific, this PR requires the following:

  • Object reference needs to be word aligned
  • There must be an constant offset between object reference and in-object address.

With these requirements, we can do the following computation to find object reference for internal pointers:

  1. Use ref_to_address with the internal pointer to get the in-object address for the internal pointer.
  2. Find the last VO bit that is set.
  3. Calculate the data address for the bit. This address is aligned to the VO bit region.
  4. Use address_to_ref with the data address above. The return value is an object reference, but may not be a valid one.
  5. Use the word alignment to find the actual object reference
  6. Get the object size from the object reference, and make sure that the internal pointer is actually pointing into the object

Step 1 is necessary if the internal pointer is before the in-object address. If we concern about the invalid object reference in ref_to_address and address_to_ref, we could consider removing the two functions, and use a constant offset instead.

@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Jun 27, 2024
@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

The following is some discussion with Steve on this topic.

('in-object' address/mapping or 'in-allocation' address/mapping refers to the address returned by ref_to_address)

Steve Blackburn: I think we need to allow stronger assumptions about the alignment of the in-allocation address or the object reference (I think either will do). We cannot make assumptions about the start because it may be explicitly mis-aligned w.r.t. the object reference. Object references (meaning the value of the reference, not the location of the reference) usually have strong alignment assumptions in runtimes. The binding should declare what that alignment is. I'm not sure whether we should be using 'in-allocation' or 'object-reference'. The advantage of in-allocation is that it makes things simpler since the address (by definition) points into the object. But there's no natural reason for it to be aligned, so it might be a problem to demand an alignment. On the other hand object reference will in general require some arithmetic before its useful, but it generally will be naturally aligned since it is normal for them to have word alignment.

Steve Blackburn: If the minimum allocation alignment is 1 word or greater (which I expect it is), then it should not be hard to get the binding to make a VO -> in-object mapping, which would allow the mark bit to be set. For example, if the alignment was 1 word and the in-object mapping had a constant relationship to the object reference, then the in-object address would either be word aligned, or would have a known constant distance to word alignment. If so it should not be hard to make a binding function that sets a mark bit based on a VO location. e.g.: interior pointer -> VO bit -> in-object address -> mark bit.

Steve Blackburn: I assume that what makes most sense is to ask all language bindings to specify a minimum alignment on the object reference, and to require that the relationship between object reference and 'in-object pointer' be a constant. Thus the minimum alignment of the in-object pointer will be known (modulo the constant offset).

Steve Blackburn: We should probably have an MMTk-level discussion about formalizing this. I think it is unavoidable that we formalize this (I don't think we can continue with this issue being ill-defined).

There was a little bit more discussion about allocation alignment and payload.

Steve Blackburn: Each binding must specify an a minium alignment
Steve Blackburn: My understanding is that that is with respect to the "object", i.e. the payload.
Steve Blackburn: Payload means the non-header part of the object, ie its fields. So the start of the payload is the location of the first field.
Steve Blackburn: NOT the start of the allocation
Steve Blackburn: NOT the object reference
Yi Lin: (Quote) If the minimum allocation alignment is 1 word or greater (which I expect it is)
Yi Lin: We dont have this assumption yet. We allow alloctiaon alignment to be smaller than 1 word at the moment.
Steve Blackburn: Why?
Steve Blackburn: This seems very strange
Steve Blackburn: You just need to fix that, which should be straightforward

@qinsoon qinsoon marked this pull request as ready for review July 2, 2024 04:26
@qinsoon qinsoon requested a review from wks July 2, 2024 04:26
@wks
Copy link
Collaborator

wks commented Jul 2, 2024

Some high-level comments:

Requiring ObjectReference to be an address. This PR adds a requirement that ObjectReference must be an address (and it must be aligned, and a constant offset from some address inside the object). While I personally strongly support this change, we did have some previous discussion, and some of us argued the opposite. The opposite idea is that "we should give the VM binding more freedom of what ObjectReference is, including whether it may include tag bits (which makes it unaligned)." This is just a reminder. Related link: #1044

ObjectReference no longer needs to be what a reference field holds. If we require ObjectReference to be aligned, and a VM stores unaligned addresses in reference fields (due to offsets or tag bits), then such VM will require a conversion in Slot::load() to give MMTk-core an aligned address, and another conversion in Slot::store() to restore the unaligned-ness (by adding back the tag bits or offset) before writing back to the field. This is not hard to do, and some VMs are already doing this. But that leads to another question:

If ObjectReference is not what's in the field, and we do almost everything with objref.to_address(), why don't we define ObjectReference to be equal to the "in-object address"? In other word, should we require that ObjectReference must always be inside the allocation?

If we do this, we can unify to_raw_address and to_address. MMTk core almost never works with raw addresses. The only not-for-testing use case of raw address in the current mmtk-core is the forwarding pointer of moved object.

If we do this, we can further simplify operations on VO bits such that "VO bits are always set at the address of ObjectReference". That should greatly simplify things.

One drawback I can think of is debugging. Previously, ObjectReference could be the same as the contents in the field for VMs like JikesRVM, but it will no longer be. OpenJDK and Ruby are not impacted. V8 reference fields always has tag bits, so the value should never have been the same as ObjectReference.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

Requiring ObjectReference to be an address. This PR adds a requirement that ObjectReference must be an address (and it must be aligned, and a constant offset from some address inside the object). While I personally strongly support this change, we did have some previous discussion, and some of us argued the opposite. The opposite idea is that "we should give the VM binding more freedom of what ObjectReference is, including whether it may include tag bits (which makes it unaligned)." This is just a reminder. Related link: #1044

This is not introduced by this PR. We already stated this clearly in ObjectReference:
MMTk expects each object reference to have a pointer to the object (an address) in each object reference, and that address should be used for this ObjectReference type.

ObjectReference no longer needs to be what a reference field holds. If we require ObjectReference to be aligned, and a VM stores unaligned addresses in reference fields (due to offsets or tag bits), then such VM will require a conversion in Slot::load() to give MMTk-core an aligned address, and another conversion in Slot::store() to restore the unaligned-ness (by adding back the tag bits or offset) before writing back to the field. This is not hard to do, and some VMs are already doing this. But that leads to another question:

This is also not introduced by this PR. We have agreed that we should use Slot to deal with the tag.

If ObjectReference is not what's in the field, and we do almost everything with objref.to_address(), why don't we define ObjectReference to be equal to the "in-object address"? In other word, should we require that ObjectReference must always be inside the allocation?

If we do this, we can unify to_raw_address and to_address. MMTk core almost never works with raw addresses. The only not-for-testing use case of raw address in the current mmtk-core is the forwarding pointer of moved object.

If we do this, we can further simplify operations on VO bits such that "VO bits are always set at the address of ObjectReference". That should greatly simplify things.

One drawback I can think of is debugging. Previously, ObjectReference could be the same as the contents in the field for VMs like JikesRVM, but it will no longer be. OpenJDK and Ruby are not impacted. V8 reference fields always has tag bits, so the value should never have been the same as ObjectReference.

The idea of object reference and in-object address is not introduced in this PR, though this PR does add more documentation to clarify it.

Currently it is up to the binding. It is not as simple as what you propose, but it is still clear and works for different bindings.

  • If their object reference matches our object reference definition, and points to an address that is inside the object, they can use the same address for both object reference and the in-object address. Actually most of our bindings do that.
  • If their object reference matches our definition, and may not point to an addresss inside the object, they will have to use a different in-object address. JikesRVM is the example.

to_raw_address is mostly for bindings to safely convert between object reference and address. MMTk core probably only uses it in some corner cases (like check VO bit, or find base references).

@wks
Copy link
Collaborator

wks commented Jul 2, 2024

Some detailed comments. Let's assume we do what you described in this PR (and forget about my aggressive suggestion of making raw address == in-allocation address).

Alignment

Object reference needs to be word aligned

Should we require objref.to_address() to always return an aligned address, too? It should be a reasonable requirement, and it should simplify things (e.g. if we use that to set metadata, such as VO bit, we always set metadata at aligned addresses). This PR currently only requires it to be a "constant offset" from the object reference, but the "constant offset" is not said to be a multiple of word size.

... This address is aligned to the VO bit region. 4. Use address_to_ref with the data address above. The return value is an object reference, but may not be a valid one.

This means address_to_ref(addr) where addr is word-aligned may return an ObjectReference whose raw value is not word-aligned. This is a bit strange as it seems to imply the "constant offset" may not be a multiple of word size, and we need to "5. Use the word alignment to find the actual object reference".

Using address_to_ref on internal pointer

Use ref_to_address with the internal pointer to get the in-object address for the internal pointer.

We define an "in-object address of an ObjectReference" because an ObjectReference is not necessarily inside an object. But an internal pointer is always pointing at a field and, by definition, is always inside an object. I think it is pointless to call ref_to_adderss on an internal pointer. But the last paragraph described its purpose:

Step 1 is necessary if the internal pointer is before the in-object address

It's not sufficient. If the object layout is | GCHeader | MiscHeader | JavaHeader | length | elem0 | elem1 | ... |, each header is 8 byte, and ObjectReference is at start + 32 (after length), and its in-object address is at start + 24, this means ref_to_address(objref) == objref - 8, and address_to_ref(addr) = addr + 8. If we have an internal reference pointing at start + 0, then address_to_ref(start + 0) will be start + 8, which is still before the in-object address which is start + 24.

And if the internal pointer points to the very last word of an object, and the offset from in-object address to ObjectReference is positive, then the result will be outside the object.

I think we may need VM-specific knowledge to find ObjectReference from an internal pointer. If the in-object address is the same as the object start, we only need to search towards lower addresses. If, in a VM, an internal pointer always points to the payload, and never points to the header (which is not always true for VMs mainly implemented in C/C++ because the runtime may be fiddling with the header while triggering GC), and the in-object address is after the header like the example above, we will only need to search towards lower addresses, too. In general, we may need to search in both directions, but with different limits, and be careful not to go beyond page/block/chunk boundaries because metadata may not be mapped, and some spaces (such as MarkSweepSpace and ImmixSpace) never allocate objects across block boundaries.

But if we unify to_address and to_raw_address, this part can be simplified, too.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

For alignment, we can assume that most language use aligned object reference. So it is natural for us to require aligned object reference, rather than aligned in-object address.

For the corner case that the internal pointer is before the in-object address, you are right. We would have to search forwards if that happens. We should let bindings to tell us, and possibly we can leave it as unimplemented for a while.

@wks
Copy link
Collaborator

wks commented Jul 2, 2024

For alignment, we can assume that most language use aligned object reference. So it is natural for us to require aligned object reference, rather than aligned in-object address.

I mean, I suggest adding a requirement that "both ObjectReference and in-object address needs to be word-aligned". (That implies their difference must be a multiple of word size, too.) I think all VMs can satisfy this, and this will simplify the VO bit searching algorithm. Concretely, in step 4, if VO bit is set at an (aligned) address, then address_to_ref(addr) will be guaranteed to be aligned, too, and that must be a valid ObjectReference.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

For alignment, we can assume that most language use aligned object reference. So it is natural for us to require aligned object reference, rather than aligned in-object address.

I mean, I suggest adding a requirement that "both ObjectReference and in-object address needs to be word-aligned". (That implies their difference must be a multiple of word size, too.) I think all VMs can satisfy this, and this will simplify the VO bit searching algorithm. Concretely, in step 4, if VO bit is set at an (aligned) address, then address_to_ref(addr) will be guaranteed to be aligned, too, and that must be a valid ObjectReference.

Calling address_to_ref() does not add much complexity -- it is just applying an offset. If your concern is the invalid object references that is passed or returned from ref_to_address and address_to_ref, we can just remove those methods and ask the binding for a constant offset instead.

@wks
Copy link
Collaborator

wks commented Jul 2, 2024

I mean, I suggest adding a requirement that "both ObjectReference and in-object address needs to be word-aligned". (That implies their difference must be a multiple of word size, too.) I think all VMs can satisfy this, and this will simplify the VO bit searching algorithm. Concretely, in step 4, if VO bit is set at an (aligned) address, then address_to_ref(addr) will be guaranteed to be aligned, too, and that must be a valid ObjectReference.

Calling address_to_ref() does not add much complexity -- it is just applying an offset. If your concern is the invalid object references that is passed or returned from ref_to_address and address_to_ref, we can just remove those methods and ask the binding for a constant offset instead.

I am not talking about the complexity of calling address_to_ref() itself. I was talking about eliminating step 5 by requiring the result of address_to_ref to be aligned. Actually this PR requires any ObjectReference to be aligned. If we ever attempt to construct an unaligned ObjectReference, there will be an assertion error.

I don't understand the part "The return value is an object reference, but may not be a valid one." If the in-object address addr is aligned, and the result of address_to_ref(addr) is an ObjectReference which must also be aligned, why the resulting ObjectReference can still be invalid? What does "Use the word alignment to find the actual object reference" mean?

@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

I don't understand the part "The return value is an object reference, but may not be a valid one." If the in-object address addr is aligned, and the result of address_to_ref(addr) is an ObjectReference which must also be aligned, why the resulting ObjectReference can still be invalid? What does "Use the word alignment to find the actual object reference" mean?

The in-object address addr may not be aligned.

@wks
Copy link
Collaborator

wks commented Jul 2, 2024

I don't understand the part "The return value is an object reference, but may not be a valid one." If the in-object address addr is aligned, and the result of address_to_ref(addr) is an ObjectReference which must also be aligned, why the resulting ObjectReference can still be invalid? What does "Use the word alignment to find the actual object reference" mean?

The in-object address addr may not be aligned.

This will be a problem. If, in a VM, the in-object address is not aligned, and the offset between ObjectReference and in-object address is a constant, then the offset must not be a multiple of word size. This means if addr is aligned in the first place (as returned from step 3, "This address is aligned to the VO bit region."), the result of address_to_obj(addr) will not be aligned. This will trigger an assertion error immediately because it attempts to construct an ObjectReference which is required to be aligned.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 2, 2024

I don't understand the part "The return value is an object reference, but may not be a valid one." If the in-object address addr is aligned, and the result of address_to_ref(addr) is an ObjectReference which must also be aligned, why the resulting ObjectReference can still be invalid? What does "Use the word alignment to find the actual object reference" mean?

The in-object address addr may not be aligned.

This will be a problem. If, in a VM, the in-object address is not aligned, and the offset between ObjectReference and in-object address is a constant, then the offset must not be a multiple of word size. This means if addr is aligned in the first place (as returned from step 3, "This address is aligned to the VO bit region."), the result of address_to_obj(addr) will not be aligned. This will trigger an assertion error immediately because it attempts to construct an ObjectReference which is required to be aligned.

I see. We will have to require both object reference and in-object address to be aligned then. It is not necessary to require in-object address to be aligned. When we get the VO bit address (aligned), we just need to apply an offset, and align up to find the correct object reference. Requiring in-object address to be aligned is not a prerequisite for us to compute the object reference from VO bit.

Requiring in-object address to be aligned can make things simpler. But we don't know concrete cases where this requirement can significantly simplify things. On the other hand, requiring aligned in-object address also imposes a risk that a binding may not be able to provide such an aligned in-object address. But, we also don't have concrete examples either. So at this point, I don't have think we have enough evidence to say we should or should not require aligned in-object address.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 5, 2024

We may want to replace ref_to_address and address_to_ref with a constant. This can avoid using and producing 'invalid object reference' for those two methods when we do computation for VO bit.

@qinsoon qinsoon removed the PR-extended-testing Run extended tests for the pull request label Jul 10, 2024
@qinsoon
Copy link
Member Author

qinsoon commented Jul 10, 2024

@wks Can you review this PR again? I am working on the binding PRs. But you can review this PR first -- the binding PRs will be straightforward.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 10, 2024

binding-refs
OPENJDK_BINDING_REPO=qinsoon/mmtk-openjdk
OPENJDK_BINDING_REF=update-mmtk-core-pr-1159
JIKESRVM_BINDING_REPO=qinsoon/mmtk-jikesrvm
JIKESRVM_BINDING_REF=update-mmtk-core-pr-1159
V8_BINDING_REPO=qinsoon/mmtk-v8
V8_BINDING_REF=update-mmtk-core-pr-1159
JULIA_BINDING_REPO=qinsoon/mmtk-julia
JULIA_BINDING_REF=update-mmtk-core-pr-1159
RUBY_BINDING_REPO=qinsoon/mmtk-ruby
RUBY_BINDING_REF=update-mmtk-core-pr-1159

src/vm/object_model.rs Outdated Show resolved Hide resolved
src/vm/object_model.rs Outdated Show resolved Hide resolved
src/vm/object_model.rs Outdated Show resolved Hide resolved
src/util/address.rs Outdated Show resolved Hide resolved
src/vm/object_model.rs Outdated Show resolved Hide resolved
@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Jul 10, 2024
@qinsoon
Copy link
Member Author

qinsoon commented Jul 10, 2024

I have made changes based on the suggestions.

src/vm/object_model.rs Outdated Show resolved Hide resolved
src/util/address.rs Outdated Show resolved Hide resolved
src/util/address.rs Show resolved Hide resolved
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks
Copy link
Collaborator

wks commented Jul 10, 2024

The comments on the function memory_manager::is_mmtk_object should be updated, too. Because ObjectReference must be aligned, some of the description, such as ObjectReference in the range of low <= objref < high, is no longer relevant.

I also mentioned the possibility of moving is_mmtk_object into the module mmtk::util::is_mmtk_object and changing its return type in #1165 (comment). Since this PR also changes bindings, it is a good time to make the change.

@qinsoon
Copy link
Member Author

qinsoon commented Jul 11, 2024

The comments on the function memory_manager::is_mmtk_object should be updated, too. Because ObjectReference must be aligned, some of the description, such as ObjectReference in the range of low <= objref < high, is no longer relevant.

I also mentioned the possibility of moving is_mmtk_object into the module mmtk::util::is_mmtk_object and changing its return type in #1165 (comment). Since this PR also changes bindings, it is a good time to make the change.

I updated the doc for is_mmtk_object, but I didn't change the code or its return type. I will change the return type in the other PR https://github.com/mmtk/mmtk-core/pull/1165/files.

@qinsoon qinsoon enabled auto-merge July 11, 2024 03:28
@qinsoon qinsoon disabled auto-merge July 11, 2024 03:36
@qinsoon qinsoon added this pull request to the merge queue Jul 11, 2024
Merged via the queue into mmtk:master with commit a3a72f8 Jul 11, 2024
24 of 25 checks passed
@qinsoon qinsoon deleted the aligned-obj-ref branch July 11, 2024 07:00
github-merge-queue bot pushed a commit that referenced this pull request Jul 19, 2024
This PR adds internal pointer support. It supersedes
#1155 which provides a simple but
inefficient implementation for internal pointers. This PR is based on
#1159 which adds requirements for
object reference alignment.

This PR
* adds `memory_manager::find_object_from_internal_pointer`
  * The call is dispatched using SFT to each space.
* Large object space only checks the first word in VO bit for every
page.
* Mark sweep and immix space only searches for the max object size for
those spaces.
* Allow iterating side metadata bits.
* Allow loading raw byte/word in side metadata.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants