Skip to content

Commit

Permalink
Compressed Oops Support (#235)
Browse files Browse the repository at this point in the history
This PR adds compressed oops support for mmtk-openjdk, and enables it by
default.

# Implementation strategy and workarounds

## Heap layout and compression policy

This PR uses the `Edge` type to abstract over the compressed edge and
uncompressed edge. Object field loads and stores for uncompressed edges
work as before. Loads and stores for compressed edges will involve an
additional compression and decompression step.

In general, this is the function to decode a 32-bit compressed pointer
to its uncompressed form:

```rust
fn decode(compressed_oop: u32) -> u64 {
    BASE + ((compressed_oop as u64) << SHIFT)
}
```

OpenJDK has a few optimizations to reduce the add and shift operations
in JIT-compiled code, this PR supports them all:

1. For heap <= 3G, if we set the heap range as
`0x4000_0000..0x1_0000_0000`, it is possible to totally remove the add
and the shift. The compressed and uncompressed forms are identical.
    * Set `BASE = 0` and `SHIFT = 0` for this case.
2. For heap <= 31G, if we set the heap range as
`0x4000_0000..0x8_0000_0000`, it is possible to remove the add.
    * Set `BASE = 0` and `SHIFT = 3` for this case.
3. For heap > 31G, the add and shift operation is still necessary.

For cases (1) and (2), the jit compiled code will contain less or even
no encoding/decoding instructions, and hence improve the mutator
performance. However, in Rust code, we still do the add and shift
unconditionally, even when `BASE` or `SHIFT` is set to zero.

## NULL pointer checking

Generally, `BASE` can be any address as long as the memory is not
reserved by others. However, `BASE` must be smaller than `HEAP_START`,
otherwise `HEAP_START` will be encoded as `0` and be treated as a null
pointer.

Same as openjdk, we set `BASE` to `HEAP_START - 4096` to solve this
issue.

## Type specialization

Since we only support one edge type per binding, providing two
`OpenJDKEdge` in one `MMTK` instance is not possible.

This PR solves the issue by specializing almost all the types in the
binding, with a `const COMPRESSED: bool` generic type argument. It
provides two `MMTK` singletons: `MMTK<OpenJDK<COMPRESSED = true>>` and
`MMTK<OpenJDK<COMPRESSED = false>>`. `MMTK<OpenJDK<COMPRESSED = true>>`
will have the `OpenJDKEdge<COMPRESSED = true>` edge that does the extra
pointer compression/decompression.

The two MMTK singletons are wrapped in two lazy_static global variables.
The binding will only initialize one of them depending on the OpenJDK
command-line arguments. Initializing the wrong one that does not match
the `UseCompressedOops` flag will trigger an assertion failure.

## Pointer tagging

When compressed oops is enabled, all the fields are guaranteed to be
compressed oops. However, stack or other global root pointers may be
still uncompressed. The GC needs to handle both compressed and
uncompressed edges and be able to distinguish between them.

To support this, this PR treats all the root `OpenJDKEdge<COMPRESSED =
true>`s as tagged pointers. If the 63-th bit is set, this indicates that
this edge points to a 64-nit uncompressed oop, instead of a compressed
oop. And the `OpenJDKEdge::{load, store}` methods will skip the
encoding/decoding step.

For object field edges, the encoding is performed unconditionally
without the pointer tag check.

When compressed oops is disabled, there is no pointer tag check as well.

## Embedded pointers

Some (or probably all) pointers embedded in code objects are also
compressed. On x64, it is always compressed to a `u32` integer that sits
in an unaligned memory location. This means we need to (1) treat them as
compressed oops just like other roots. (2) still performs the unaligned
stores and loads.

However, for other architectures, the compressed embedded pointers may
not be encoded as a `u32` anymore.

## Compressed `Klass*` pointers

When `UseCompressedOops` is enabled, by default it also enables
`UseCompressedClassPointers`. This will make the `Klass*` pointer in the
object header compressed to a `u32` as well. This PR supports class
pointer compression as well.

However, class pointer compression is only supported and tested when the
compressed oops is enabled. The two flags must be enabled or disabled
together. Enabling only one of them is not tested, not supported, and
will trigger a runtime assertion failure.

---

# Performance results

[SemiSpace](http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/mm26Ra)
[Immix](http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/wEDPv4)

---------

Co-authored-by: Yi Lin <qinsoon@gmail.com>
  • Loading branch information
wenyuzhao and qinsoon authored Oct 4, 2023
1 parent 9ab13ae commit f0ff0b5
Show file tree
Hide file tree
Showing 24 changed files with 1,003 additions and 399 deletions.
6 changes: 3 additions & 3 deletions .github/scripts/ci-matrix-result-check.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ def read_in_plans():
value = m.group(1)
else:
raise ValueError(f"Cannot find a plan string in {prop}")

# Store the value in the dictionary
key = chr(97+i)
results[key] = value

return results

def read_in_actual_results(line, plan_dict):
Expand Down Expand Up @@ -144,7 +144,7 @@ def print_log(directory, search_string):
if expected[plan] == "ignore":
print(f"Result for {plan} is ignored")
continue

if expected[plan] != actual[plan]:
error_no = 1
if expected[plan] == "pass":
Expand Down
3 changes: 2 additions & 1 deletion .github/scripts/ci-test-assertions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,5 @@ sudo sysctl -w vm.max_map_count=655300
export MMTK_PLAN=PageProtect

build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar fop
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar luindex
# Note: Disable compressed pointers for luindex as it does not work well with GC plans that uses virtual memory excessively.
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -XX:-UseCompressedOops -XX:-UseCompressedClassPointers -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar luindex
2 changes: 1 addition & 1 deletion .github/scripts/ci-test-malloc-mark-sweep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ run_test() {
# Malloc marksweep is horribly slow. We just run fop.

# build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar antlr
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms50M -Xmx50M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar fop
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:-UseCompressedOops -XX:-UseCompressedClassPointers -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms50M -Xmx50M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar fop
# build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar luindex
# build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar pmd
# build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar hsqldb
Expand Down
164 changes: 164 additions & 0 deletions .github/scripts/ci-test-only-normal-no-compressed-oops.sh

Large diffs are not rendered by default.

11 changes: 0 additions & 11 deletions .github/scripts/ci-test-only-normal.sh
Original file line number Diff line number Diff line change
Expand Up @@ -151,14 +151,3 @@ build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHea
# These benchmarks take 40s+ for slowdebug build, we may consider removing them from the CI
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -XX:TieredStopAtLevel=1 -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar hsqldb
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -XX:TieredStopAtLevel=1 -Xms500M -Xmx500M -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar eclipse

# --- PageProtect ---
# Make sure this runs last in our tests unless we want to set it back to the default limit.
sudo sysctl -w vm.max_map_count=655300

export MMTK_PLAN=PageProtect

build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar antlr
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar fop
build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar luindex
# build/linux-x86_64-normal-server-$DEBUG_LEVEL/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -Xms4G -Xmx4G -jar $DACAPO_PATH/dacapo-2006-10-MR2.jar pmd
2 changes: 2 additions & 0 deletions .github/scripts/ci-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ cd $cur
cd $cur
./ci-test-only-normal.sh
cd $cur
./ci-test-only-normal-no-compressed-oops.sh
cd $cur
./ci-test-only-weak-ref.sh
cd $cur
./ci-test-assertions.sh
Expand Down
2 changes: 2 additions & 0 deletions mmtk/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions mmtk/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ openjdk_version = "28e56ee32525c32c5a88391d0b01f24e5cd16c0f"
libc = "0.2"
lazy_static = "1.1"
once_cell = "1.10.0"
atomic = "0.5.1"
memoffset = "0.9.0"
# Be very careful to commit any changes to the following mmtk dependency, as our CI scripts (including mmtk-core CI)
# rely on matching these lines to modify them: e.g. comment out the git dependency and use the local path.
# These changes are safe:
Expand Down
124 changes: 98 additions & 26 deletions mmtk/src/abi.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
use crate::UPCALLS;
use super::UPCALLS;
use crate::OpenJDKEdge;
use atomic::Atomic;
use atomic::Ordering;
use mmtk::util::constants::*;
use mmtk::util::conversions;
use mmtk::util::ObjectReference;
use mmtk::util::{Address, OpaquePointer};
use std::ffi::CStr;
use std::fmt;
use std::sync::atomic::AtomicUsize;
use std::{mem, slice};

#[repr(i32)]
Expand Down Expand Up @@ -80,7 +84,7 @@ impl Klass {
pub const LH_HEADER_SIZE_SHIFT: i32 = BITS_IN_BYTE as i32 * 2;
pub const LH_HEADER_SIZE_MASK: i32 = (1 << BITS_IN_BYTE) - 1;
pub unsafe fn cast<'a, T>(&self) -> &'a T {
&*(self as *const _ as usize as *const T)
&*(self as *const Self as *const T)
}
/// Force slow-path for instance size calculation?
const fn layout_helper_needs_slow_path(lh: i32) -> bool {
Expand Down Expand Up @@ -168,7 +172,7 @@ impl InstanceKlass {
const VTABLE_START_OFFSET: usize = Self::HEADER_SIZE * BYTES_IN_WORD;

fn start_of_vtable(&self) -> *const usize {
unsafe { (self as *const _ as *const u8).add(Self::VTABLE_START_OFFSET) as _ }
(Address::from_ref(self) + Self::VTABLE_START_OFFSET).to_ptr()
}

fn start_of_itable(&self) -> *const usize {
Expand Down Expand Up @@ -263,24 +267,53 @@ impl InstanceRefKlass {
}
*DISCOVERED_OFFSET
}
pub fn referent_address(oop: Oop) -> Address {
oop.get_field_address(Self::referent_offset())
pub fn referent_address<const COMPRESSED: bool>(oop: Oop) -> OpenJDKEdge<COMPRESSED> {
oop.get_field_address(Self::referent_offset()).into()
}
pub fn discovered_address(oop: Oop) -> Address {
oop.get_field_address(Self::discovered_offset())
pub fn discovered_address<const COMPRESSED: bool>(oop: Oop) -> OpenJDKEdge<COMPRESSED> {
oop.get_field_address(Self::discovered_offset()).into()
}
}

#[repr(C)]
union KlassPointer {
/// uncompressed Klass pointer
klass: &'static Klass,
/// compressed Klass pointer
narrow_klass: u32,
}

#[repr(C)]
pub struct OopDesc {
pub mark: usize,
pub klass: &'static Klass,
klass: KlassPointer,
}

static COMPRESSED_KLASS_BASE: Atomic<Address> = Atomic::new(Address::ZERO);
static COMPRESSED_KLASS_SHIFT: AtomicUsize = AtomicUsize::new(0);

/// When enabling compressed pointers, the class pointers are also compressed.
/// The c++ part of the binding should pass the compressed klass base and shift to rust binding, as object scanning will need it.
pub fn set_compressed_klass_base_and_shift(base: Address, shift: usize) {
COMPRESSED_KLASS_BASE.store(base, Ordering::Relaxed);
COMPRESSED_KLASS_SHIFT.store(shift, Ordering::Relaxed);
}

impl OopDesc {
pub fn start(&self) -> Address {
unsafe { mem::transmute(self) }
}

pub fn klass<const COMPRESSED: bool>(&self) -> &'static Klass {
if COMPRESSED {
let compressed = unsafe { self.klass.narrow_klass };
let addr = COMPRESSED_KLASS_BASE.load(Ordering::Relaxed)
+ ((compressed as usize) << COMPRESSED_KLASS_SHIFT.load(Ordering::Relaxed));
unsafe { &*addr.to_ptr::<Klass>() }
} else {
unsafe { self.klass.klass }
}
}
}

impl fmt::Debug for OopDesc {
Expand All @@ -292,8 +325,24 @@ impl fmt::Debug for OopDesc {
}
}

/// 32-bit compressed klass pointers
#[repr(transparent)]
#[derive(Clone, Copy)]
pub struct NarrowKlass(u32);

pub type Oop = &'static OopDesc;

/// 32-bit compressed reference pointers
#[repr(transparent)]
#[derive(Clone, Copy)]
pub struct NarrowOop(u32);

impl NarrowOop {
pub fn slot(&self) -> Address {
Address::from_ref(self)
}
}

/// Convert ObjectReference to Oop
impl From<ObjectReference> for &OopDesc {
fn from(o: ObjectReference) -> Self {
Expand Down Expand Up @@ -323,8 +372,8 @@ impl OopDesc {
}

/// Calculate object instance size
pub unsafe fn size(&self) -> usize {
let klass = self.klass;
pub unsafe fn size<const COMPRESSED: bool>(&self) -> usize {
let klass = self.klass::<COMPRESSED>();
let lh = klass.layout_helper;
// The (scalar) instance size is pre-recorded in the TIB?
if lh > Klass::LH_NEUTRAL_VALUE {
Expand All @@ -336,7 +385,7 @@ impl OopDesc {
} else if lh <= Klass::LH_NEUTRAL_VALUE {
if lh < Klass::LH_NEUTRAL_VALUE {
// Calculate array size
let array_length = self.as_array_oop().length();
let array_length = self.as_array_oop().length::<COMPRESSED>();
let mut size_in_bytes: usize =
(array_length as usize) << Klass::layout_helper_log2_element_size(lh);
size_in_bytes += Klass::layout_helper_header_size(lh) as usize;
Expand All @@ -356,34 +405,57 @@ pub struct ArrayOopDesc(OopDesc);
pub type ArrayOop = &'static ArrayOopDesc;

impl ArrayOopDesc {
const LENGTH_OFFSET: usize = mem::size_of::<Self>();
fn length_offset<const COMPRESSED: bool>() -> usize {
let klass_offset_in_bytes = memoffset::offset_of!(OopDesc, klass);
if COMPRESSED {
klass_offset_in_bytes + mem::size_of::<NarrowKlass>()
} else {
klass_offset_in_bytes + mem::size_of::<KlassPointer>()
}
}

fn element_type_should_be_aligned(ty: BasicType) -> bool {
ty == BasicType::T_DOUBLE || ty == BasicType::T_LONG
}

fn header_size(ty: BasicType) -> usize {
let typesize_in_bytes =
conversions::raw_align_up(Self::LENGTH_OFFSET + BYTES_IN_INT, BYTES_IN_LONG);
fn header_size<const COMPRESSED: bool>(ty: BasicType) -> usize {
let typesize_in_bytes = conversions::raw_align_up(
Self::length_offset::<COMPRESSED>() + BYTES_IN_INT,
BYTES_IN_LONG,
);
if Self::element_type_should_be_aligned(ty) {
conversions::raw_align_up(typesize_in_bytes / BYTES_IN_WORD, BYTES_IN_LONG)
} else {
typesize_in_bytes / BYTES_IN_WORD
}
}
fn length(&self) -> i32 {
unsafe { *((self as *const _ as *const u8).add(Self::LENGTH_OFFSET) as *const i32) }
fn length<const COMPRESSED: bool>(&self) -> i32 {
unsafe { (Address::from_ref(self) + Self::length_offset::<COMPRESSED>()).load::<i32>() }
}
fn base(&self, ty: BasicType) -> Address {
let base_offset_in_bytes = Self::header_size(ty) * BYTES_IN_WORD;
Address::from_ptr(unsafe { (self as *const _ as *const u8).add(base_offset_in_bytes) })
fn base<const COMPRESSED: bool>(&self, ty: BasicType) -> Address {
let base_offset_in_bytes = Self::header_size::<COMPRESSED>(ty) * BYTES_IN_WORD;
Address::from_ref(self) + base_offset_in_bytes
}
// This provides an easy way to access the array data in Rust. However, the array data
// is Java types, so we have to map Java types to Rust types. The caller needs to guarantee:
// 1. <T> matches the actual Java type
// 2. <T> matches the argument, BasicType `ty`
pub unsafe fn data<T>(&self, ty: BasicType) -> &[T] {
slice::from_raw_parts(self.base(ty).to_ptr(), self.length() as _)
/// This provides an easy way to access the array data in Rust. However, the array data
/// is Java types, so we have to map Java types to Rust types. The caller needs to guarantee:
/// 1. `<T>` matches the actual Java type
/// 2. `<T>` matches the argument, BasicType `ty`
pub unsafe fn data<T, const COMPRESSED: bool>(&self, ty: BasicType) -> &[T] {
slice::from_raw_parts(
self.base::<COMPRESSED>(ty).to_ptr(),
self.length::<COMPRESSED>() as _,
)
}

pub unsafe fn slice<const COMPRESSED: bool>(
&self,
ty: BasicType,
) -> crate::OpenJDKEdgeRange<COMPRESSED> {
let base = self.base::<COMPRESSED>(ty);
let start = base;
let lshift = OpenJDKEdge::<COMPRESSED>::LOG_BYTES_IN_EDGE;
let end = base + ((self.length::<COMPRESSED>() as usize) << lshift);
(start..end).into()
}
}

Expand Down
25 changes: 12 additions & 13 deletions mmtk/src/active_plan.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
use crate::MutatorClosure;
use crate::OpenJDK;
use crate::SINGLETON;
use crate::UPCALLS;
use mmtk::util::opaque_pointer::*;
use mmtk::vm::ActivePlan;
Expand All @@ -9,12 +8,12 @@ use mmtk::Plan;
use std::collections::VecDeque;
use std::marker::PhantomData;

struct OpenJDKMutatorIterator<'a> {
mutators: VecDeque<&'a mut Mutator<OpenJDK>>,
struct OpenJDKMutatorIterator<'a, const COMPRESSED: bool> {
mutators: VecDeque<&'a mut Mutator<OpenJDK<COMPRESSED>>>,
phantom_data: PhantomData<&'a ()>,
}

impl<'a> OpenJDKMutatorIterator<'a> {
impl<'a, const COMPRESSED: bool> OpenJDKMutatorIterator<'a, COMPRESSED> {
fn new() -> Self {
let mut mutators = VecDeque::new();
unsafe {
Expand All @@ -29,8 +28,8 @@ impl<'a> OpenJDKMutatorIterator<'a> {
}
}

impl<'a> Iterator for OpenJDKMutatorIterator<'a> {
type Item = &'a mut Mutator<OpenJDK>;
impl<'a, const COMPRESSED: bool> Iterator for OpenJDKMutatorIterator<'a, COMPRESSED> {
type Item = &'a mut Mutator<OpenJDK<COMPRESSED>>;

fn next(&mut self) -> Option<Self::Item> {
self.mutators.pop_front()
Expand All @@ -39,24 +38,24 @@ impl<'a> Iterator for OpenJDKMutatorIterator<'a> {

pub struct VMActivePlan {}

impl ActivePlan<OpenJDK> for VMActivePlan {
fn global() -> &'static dyn Plan<VM = OpenJDK> {
SINGLETON.get_plan()
impl<const COMPRESSED: bool> ActivePlan<OpenJDK<COMPRESSED>> for VMActivePlan {
fn global() -> &'static dyn Plan<VM = OpenJDK<COMPRESSED>> {
crate::singleton::<COMPRESSED>().get_plan()
}

fn is_mutator(tls: VMThread) -> bool {
unsafe { ((*UPCALLS).is_mutator)(tls) }
}

fn mutator(tls: VMMutatorThread) -> &'static mut Mutator<OpenJDK> {
fn mutator(tls: VMMutatorThread) -> &'static mut Mutator<OpenJDK<COMPRESSED>> {
unsafe {
let m = ((*UPCALLS).get_mmtk_mutator)(tls);
&mut *m
&mut *(m as *mut Mutator<OpenJDK<COMPRESSED>>)
}
}

fn mutators<'a>() -> Box<dyn Iterator<Item = &'a mut Mutator<OpenJDK>> + 'a> {
Box::new(OpenJDKMutatorIterator::new())
fn mutators<'a>() -> Box<dyn Iterator<Item = &'a mut Mutator<OpenJDK<COMPRESSED>>> + 'a> {
Box::new(OpenJDKMutatorIterator::<COMPRESSED>::new())
}

fn number_of_mutators() -> usize {
Expand Down
Loading

0 comments on commit f0ff0b5

Please sign in to comment.