Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the collector to use reference counting #74

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

Others
Copy link
Owner

@Others Others commented Nov 28, 2021

This is around a ~4x speedup, but required a lot of code changes.
In terms of public API, only AtomicGc has a significant diff.
(The old API would still work, but the changes make it work
more efficently with the new internals.)

Comment on lines +36 to +37
[profile.release]
debug = true
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs commented back out


use crate::collector::GcData;
use crate::concurrency::cross_thread_buffer::CrossThreadBuffer;

type DropBuffer = CrossThreadBuffer<Arc<GcData>>;

pub(crate) struct BackgroundDropper {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in here are pure premature optimization. But I think it is better than the previous design

@jrmuizel
Copy link

A 4x speed up on what?

@Others
Copy link
Owner Author

Others commented Nov 30, 2021

@jrmuizel a 4x speedup on the stress test I use to benchmark the collector:
https://github.com/Others/shredder/blob/master/tests/stress.rs
On my machine it goes from ~12 to ~3 seconds.

This benchmark kinda blows, and I've had an issue to create new ones for a while (#7). But I'm pretty sure that this new design is significantly better on a number of axises anyway, as it removes the dumb parts of the collector where we keep a list of every Gc in the program.

This is around a ~4x speedup, but required a lot of code changes.
In terms of public API, only `AtomicGc` has a significant diff.
(The old API would still work, but the changes make it work
more efficently with the new internals.)
@Others
Copy link
Owner Author

Others commented Dec 1, 2021

See #74

Copy link

@pcodes pcodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not the most thorough/helpful for critical feedback, but was able to as a bunch of questions 😄

//
// Taken together, this means that this pointer is always valid when executing a method on
// `AtomicGc`.
atomic_ptr: AtomicPtr<GcData>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully follow this change- why is atomic_ptr no longer an arc? Did the underlying assumption about what the GcData is change?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the semantics of AtomicGc to not be sharable by default. Now you need to do DerefGc<AtomicGc<T>> if you want to share it. This is much more flexible, and equivalently performant.

///
/// `AtomicGc` should be fairly fast, but you may not assume it does not block. In fact in the
/// presence of an active garbage collection operation, all operations will block. Otherwise
/// it shouldn't block.
#[derive(Clone, Debug)]
#[derive(Debug)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the Clone trait?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the auto-derive doesn't work anymore, due to the changing type of atomic_ptr. Will add a manual impl

let data_ptr = Arc::as_ptr(data_arc);

let atomic_ptr = Arc::new(AtomicPtr::new(data_ptr as _));
let atomic_ptr = AtomicPtr::new(Arc::as_ptr(data_arc) as _);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the as _ mean? My rust is (very) rusty :P

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cast this to the right type but don't make me type out what type I need"

let atomic_ptr = Arc::new(AtomicPtr::new(data_ptr as _));
let atomic_ptr = AtomicPtr::new(Arc::as_ptr(data_arc) as _);
// Forget the initial data, we will absorb its reference counts
data.drop_preserving_reference_counts();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you dropping the initial data?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this method turns a Gc into an AtomicGc, and we need to get rid of the old Gc to do that. Gc is reference counted, but instead of incrementing and then decrementing, we can just absorb the reference count. This idea is used throughout this file

ptr = gc_data.scan_ptr().cast();
internal_handle = COLLECTOR.handle_from_data(gc_data);
// The count of the data going out decreases
COLLECTOR.decrement_reference_count(&old_arc);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the count decrease because you've absorbed the other pointer?


// Then clear and recycle the buffer
to_drop.clear();
// ignore recycling failures
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can you ignore them?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if the channel is full we don't want this whole thread to block indefinitely

/// Indicates to the `BackgroundDropper` that it should sync up with the calling code
SyncUp(Sender<()>),
}

impl BackgroundDropper {
const RECYCLING_CHANNEL_SIZE: usize = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize the changes you've made here? I don't understand the overarching idea

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this class has learned how to recycle it's buffers so the collector can get them back (rather than allocating a big buffer every time the collector runs)

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub(crate) enum RefCountPolicy {
// A transient handle doesn't increment or decrement the reference count
TransientHandle,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the Scan implementations of AtomicGc / DerefGc to create handles we can report to the collector without having them mess up our reference counts

Comment on lines +5 to +10
// `total_handles` = count_positive + count_negative
// `count_positive` is always >= the actual count >= 0
// At the beginning of every collection we recalculate `count_positive`
count_positive: AtomicI64,
// `count_negative` is always <= 0
count_negative: AtomicI64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a specific enough question. Basically this is a complex way of letting count_positive always be equal to or greater than the real count . The reason why it is split is so that threads can decrement without using a lock

use std::cell::RefCell;
use thread_local::ThreadLocal;

pub(crate) struct CrossThreadBuffer<T: Send> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? (not sure if I missed its usage somewhere else in the PR)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for the buffer that stores dead data in the collector

///
/// `AtomicGc` should be fairly fast, but you may not assume it does not block. In fact in the
/// presence of an active garbage collection operation, all operations will block. Otherwise
/// it shouldn't block.
#[derive(Clone, Debug)]
#[derive(Debug)]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the auto-derive doesn't work anymore, due to the changing type of atomic_ptr. Will add a manual impl

//
// Taken together, this means that this pointer is always valid when executing a method on
// `AtomicGc`.
atomic_ptr: AtomicPtr<GcData>,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the semantics of AtomicGc to not be sharable by default. Now you need to do DerefGc<AtomicGc<T>> if you want to share it. This is much more flexible, and equivalently performant.

let data_ptr = Arc::as_ptr(data_arc);

let atomic_ptr = Arc::new(AtomicPtr::new(data_ptr as _));
let atomic_ptr = AtomicPtr::new(Arc::as_ptr(data_arc) as _);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cast this to the right type but don't make me type out what type I need"

let atomic_ptr = Arc::new(AtomicPtr::new(data_ptr as _));
let atomic_ptr = AtomicPtr::new(Arc::as_ptr(data_arc) as _);
// Forget the initial data, we will absorb its reference counts
data.drop_preserving_reference_counts();
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this method turns a Gc into an AtomicGc, and we need to get rid of the old Gc to do that. Gc is reference counted, but instead of incrementing and then decrementing, we can just absorb the reference count. This idea is used throughout this file

Comment on lines +77 to +79
// No need for collection blocker, as we're not modifying the graph
// This is safe. See comment on the `atomic_ptr` field
let gc_data_ptr = self.atomic_ptr.load(ordering);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a data race. What happens if we:

  1. Read the data, but don't increment the count yet (count = 1)
  2. Switch to another thread that overwrites the data and decrements the count
  3. The collector runs and cleans up the "dead" data
  4. The read completes, incrementing the count, but the data is already cleaned up and invalid


// Then clear and recycle the buffer
to_drop.clear();
// ignore recycling failures
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if the channel is full we don't want this whole thread to block indefinitely

/// Indicates to the `BackgroundDropper` that it should sync up with the calling code
SyncUp(Sender<()>),
}

impl BackgroundDropper {
const RECYCLING_CHANNEL_SIZE: usize = 1;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this class has learned how to recycle it's buffers so the collector can get them back (rather than allocating a big buffer every time the collector runs)

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
pub(crate) enum RefCountPolicy {
// A transient handle doesn't increment or decrement the reference count
TransientHandle,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the Scan implementations of AtomicGc / DerefGc to create handles we can report to the collector without having them mess up our reference counts

Comment on lines +5 to +10
// `total_handles` = count_positive + count_negative
// `count_positive` is always >= the actual count >= 0
// At the beginning of every collection we recalculate `count_positive`
count_positive: AtomicI64,
// `count_negative` is always <= 0
count_negative: AtomicI64,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a specific enough question. Basically this is a complex way of letting count_positive always be equal to or greater than the real count . The reason why it is split is so that threads can decrement without using a lock

use std::cell::RefCell;
use thread_local::ThreadLocal;

pub(crate) struct CrossThreadBuffer<T: Send> {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for the buffer that stores dead data in the collector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants