How to deal with Garbage Collection #363

jasonwilliams · 2020-05-02T15:13:26Z

Current state of play

We currently use GC to wrap objects which need to live as long as something is referencing it. These are all JSValues. To make things easier we wrap all JS Values in a GC and give that the type Value.

Why not use RC?

Javascript has a lot of circular references, we could end up with memory leaks because objects will always have a reference count of 1 or more.
Constructors have a property to the prototype, and the prototype object has a property pointing back to the constructor.

More info:
https://doc.rust-lang.org/book/ch15-06-reference-cycles.html

Environments

Environments also need to be wrapped in a Gc because multiple function objects can reference the same environment, any function declaration references the outer env. So declaring more than 1 function in the same scope means they'll both have the same parent.
If a closure is returned the outer function environment needs to stay alive throughout the duration of the returned function.

@Razican has some ideas on how to improve Environment GC.

The text was updated successfully, but these errors were encountered:

Razican · 2020-05-02T16:34:14Z

Yep, so around 4% of our instructions in the code are due to us using a garbage collector. The GC will once in a while for through all the garbage collected objects and decide if they should be freed or not, which takes long: for structures/enums with members, it has to go through all its mebers one by one.

This can be alleviated firs by only deriving Trace in types where a GC'd object might leave, so all Copy types should not be derived, but we should implement an empty trace on them, for example. This could be made easier once rust-gc implements Manishearth/rust-gc#87.

Then, environments have a clear hierarchy, so we don't really need a GC. Once the parent environment goes out of the scope, all inner environments no longer need to exist. If we use a GC, they will at some point be freed by the GC, but we don't need to wait for that, this can be reasoned at compile time by Rust by using Rc (or Arc if we go multithreaded at one point).

In a hierarchical tree, the Rust guide for Rc gives us a nice usage:

A cycle between Rc pointers will never be deallocated. For this reason, Weak is used to break cycles. For example, a tree could have strong Rc pointers from parent nodes to children, and Weak pointers from children back to their parents.

So, we can remove our GC'd environments in favour of a RC'd environment. This would make it easier to implement the string interner (#279) too, since Rust now expects circular references in environments, which would make lifetimes be 'static. Not nice for passing around a reference to an interner.

jasonwilliams · 2020-05-02T17:40:52Z

Then, environments have a clear hierarchy, so we don't really need a GC. Once the parent environment goes out of the scope, all inner environments no longer need to exist.

Using the diagram above, its possible for outer() to return inner2 as a closure.
In which case outer will go out of scope whilst inner2 is still in-play, have we thought about this?

Razican · 2020-05-02T17:54:22Z

Using the diagram above, its possible for outer() to return inner2 as a closure.
In which case outer will go out of scope whilst inner2 is still in-play, have we thought about this?

In that case, would the parent environment of the function change? or do we still need the information of the previous environment?

Razican · 2020-05-03T11:12:23Z

I implemented Rc'd environments in https://github.com/jasonwilliams/boa/tree/rc_env.This was working perfectly, but when I tried to re-base with the new function objects, this gives issues with the environment in Function.

It seems this is the only object that contains an environment. One small question about this. Is this the inner environment or the outer environment?

To implement Trace on it, we need for the environment to implement Trace, but this is not possible with Rc or Weak unless rust-gc implements it for them, and I don't think it's possible to do it.

How is this environment used?

jasonwilliams · 2020-05-03T11:34:41Z

It seems this is the only object that contains an environment. One small question about this. Is this the inner environment or the outer environment?

From what I understand (please take a second look at the spec as I may be wrong) Function’s environment property is a reference to the outer environment.

The inner environment doesn’t exist until the function is actually called. When the inner function is created, it’s parent is set to function[[Environment]].

This allows callbacks to happen at a later point whilst stil maintaining lexical scope, however it also means parent environments could be referenced long after theyve gone out of scope. (Diagram above: outer finishes but inner1 is a callback somewhere)

Because functions are “run-to-completion” there’s no need to store the inner environment. However with generators I think that changes.

Razican · 2020-05-09T12:40:47Z

Looking into this, I noticed we have the InternalStateCell type, that implements Trace for a Gc.

That Gc can hold Any type, which I think makes it unsound :/ maybe @Manishearth can give us some hints about this.

Do we really need it to be of Any type? Can't we at least reduce its scope a bit?

In any case, implementing Trace for Rc would be very nice, since we could just implement Trace for a Rc pointer to an environment in a Function object.

Manishearth · 2020-05-09T13:40:33Z

You should hold a Trace instead of an Any

Razican · 2020-05-09T14:57:29Z

You should hold a Trace instead of an Any

Would this require a special Trace implementation for the InternalStateCell, or would an empty trace be enough?

In any case, this brings the extra problem, since we can no longer use downcast_ref/mut() on state, since they are not Any. I tried implementing it in #387 but I'm having issues.

Manishearth · 2020-05-10T01:27:21Z

You can have a trait that inherits from Any and Trace and make a trait object of it

Razican · 2020-05-10T10:40:46Z

You can have a trait that inherits from Any and Trace and make a trait object of it

It seems that this is not enough, or at least I'm not able to make it work. I get the errors listed in https://github.com/jasonwilliams/boa/pull/387/files#r422625290

Manishearth · 2020-05-11T15:17:08Z

Right, you can't cast between trait objects directly. But in this case what you need is for that trait to inherit from Trace as well.

Manishearth · 2020-05-11T15:18:54Z

To fix those errors you just need to cast to InternalState, not Any. If that's not enough you might have to <_ as Any>::downcast_ref(&thing)

sphinxc0re · 2020-05-24T11:19:23Z

I don't know if it applies here, but did you take a look at this: https://github.com/kyren/gc-arena
It seems like a promising technology.

Manishearth · 2020-05-24T14:28:21Z

Trait object issues are going to occur in any type that is of the form Gc<T> where T: SomeTraceTrait. The issues boa is hitting are fundamental Rust issues, not Gc issues.

sphinxc0re · 2020-06-12T15:25:19Z

Okay, but how about https://github.com/Others/shredder ? It looks like it supports cyclic structures as well

jasonwilliams · 2020-06-12T15:38:37Z

@sphinxc0re shredder looks very interesting and i would be happy for us to try it out

Manishearth · 2020-06-12T17:41:12Z

Shredder forces concurrency, which a JS runtime will likely not want since JS objects are single threaded

sphinxc0re · 2020-06-14T12:28:04Z

There is no information on "forcing" concurrency. The README states: ready for fearless concurrency: works in multi-threaded contexts. "Works in" sounds to me like it's optional. Could you elaborate where you found that information?

jasonwilliams · 2020-06-14T13:50:19Z

I think @Manishearth is referring to the fact that Shredder is built on top of Arc where as rust-gc is built on top of Rc. There's pros and cons to both.

If you don't need multithreading you're still using Arc under the hood, which will be slower than RC because it generates atomic instructions for access. Its not easy to know how much time is lost until you can measure the before and after.

Like wise if you stick with RC, it makes it difficult to move GC'd objects across threads (Which Boa may need to do at some point).

Shredder forces concurrency, which a JS runtime will likely not want since JS objects are single threaded

I don't know if this is strictly true, from what i understand JS Objects can move across realms, and each realm can be backed by a thread.
http://speakingjs.com/es5/ch17.html

Manishearth · 2020-06-14T16:16:04Z

it makes it difficult to move GC'd objects across threads (Which Boa may need to do at some point)

Not if you want to be spec-compliant 😄

each realm can be backed by a thread.

This is incorrect. The javascript execution model is single-threaded; all realms which may share objects (not including SharedArrayBuffer, which I'll get to) must have their event loops run on the same thread.

The specification calls this an "agent", annoyingly I cannot find any text in the specification that explains this clearly, rather, the spec doesn't really work without this. I might file an issue to get this clarified. Regardless, the HTML specification explains this a bit clearer: objects that can share objects must be in the same agent, and thus, the same event loop.

It's pretty clear why the specification is single-threaded from one observation: aside from SharedArrayBuffer and atomic operations (only used within SharedArrayBuffer), there are no locks in spec text anywhere. There are multiple algorithms which mutate state, but none of them have locks between the checks where they see if the state is allowed to be mutated or not, and when they mutate the state. For example the indexed setter algorithm for objects like arrays must check if the buffer is detached before writing, but if objects could be shared across threads, it would be very easy for an array buffer to get detached (which nulls out the internal buffer!) between these two steps by a different thread.

You can have multiple agents running at once; this is what happens when you use a Worker. Workers typically have their own thread/event loop, and, crucially, they do not share garbage collected objects with a different thread. Instead, objects are sent to them via postMessage(), which uses a "structured serialize" algorithm that basically performs a deep copy, with some exceptions. The exceptions are that regular array buffers have their buffers "detached" and moved over to the other thread (preventing sharing of the buffer), and shared array buffers have new SharedArrayBuffer objects created on the other side, which share the internal buffer. Since this is a new object, it has independent GC semantics, and the internal buffer can get collected when all of its parent SharedArrayBuffers are collected. In other words, the "shared" part of SharedArrayBuffer is a job for an Arc.

You can, of course, choose to allow an agent to use multiple threads for efficiency and put in some painstaking effort to ensure that all the single threaded happens-before relationships are still maintained for any data dependency. This would be an immense amount of work and multithreaded garbage collection is the least of your worries. JavaScript does not like being put on multiple threads.

HalidOdat added help wanted Extra attention is needed memory PRs and Issues related to the memory management or memory footprint. performance Performance related changes and issues labels May 2, 2020

Razican mentioned this issue May 9, 2020

Mitigate unsoundness in InternalState #387

Closed

Razican mentioned this issue Aug 13, 2020

Refactor Function #626

Merged

boa-dev locked and limited conversation to collaborators Jan 31, 2022

Razican converted this issue into discussion #1812 Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

How to deal with Garbage Collection #363

How to deal with Garbage Collection #363

jasonwilliams commented May 2, 2020

Razican commented May 2, 2020

jasonwilliams commented May 2, 2020

Razican commented May 2, 2020

Razican commented May 3, 2020 •

edited

Loading

jasonwilliams commented May 3, 2020 •

edited

Loading

Razican commented May 9, 2020 •

edited

Loading

Manishearth commented May 9, 2020

Razican commented May 9, 2020

Manishearth commented May 10, 2020

Razican commented May 10, 2020

Manishearth commented May 11, 2020

Manishearth commented May 11, 2020

sphinxc0re commented May 24, 2020

Manishearth commented May 24, 2020

sphinxc0re commented Jun 12, 2020

jasonwilliams commented Jun 12, 2020

Manishearth commented Jun 12, 2020

sphinxc0re commented Jun 14, 2020

jasonwilliams commented Jun 14, 2020

Manishearth commented Jun 14, 2020 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

How to deal with Garbage Collection #363

How to deal with Garbage Collection #363

Comments

jasonwilliams commented May 2, 2020

Current state of play

Why not use RC?

Environments

Razican commented May 2, 2020

jasonwilliams commented May 2, 2020

Razican commented May 2, 2020

Razican commented May 3, 2020 • edited Loading

jasonwilliams commented May 3, 2020 • edited Loading

Razican commented May 9, 2020 • edited Loading

Manishearth commented May 9, 2020

Razican commented May 9, 2020

Manishearth commented May 10, 2020

Razican commented May 10, 2020

Manishearth commented May 11, 2020

Manishearth commented May 11, 2020

sphinxc0re commented May 24, 2020

Manishearth commented May 24, 2020

sphinxc0re commented Jun 12, 2020

jasonwilliams commented Jun 12, 2020

Manishearth commented Jun 12, 2020

sphinxc0re commented Jun 14, 2020

jasonwilliams commented Jun 14, 2020

Manishearth commented Jun 14, 2020 • edited Loading

This issue was moved to a discussion.

Razican commented May 3, 2020 •

edited

Loading

jasonwilliams commented May 3, 2020 •

edited

Loading

Razican commented May 9, 2020 •

edited

Loading

Manishearth commented Jun 14, 2020 •

edited

Loading