Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect collection_pages calculation in mem balancer #918

Closed
qinsoon opened this issue Aug 23, 2023 · 5 comments · Fixed by #1050
Closed

Incorrect collection_pages calculation in mem balancer #918

qinsoon opened this issue Aug 23, 2023 · 5 comments · Fixed by #1050
Labels
A-heap Area: Heap (including Mmapper, VMMap) C-bug Category: Bug P-normal Priority: Normal.

Comments

@qinsoon
Copy link
Member

qinsoon commented Aug 23, 2023

The mem balancer paper uses a metric called 'bytes collected during GC'. I interpreted as 'bytes reclaimed'. But it turned out to be incorrect. It should actually be 'bytes traversed during GC'.

Basically our calculation for the following field is wrong. We should fix this.

collection_pages: f64,

As we use pages for our calculation (such as live pages, and allocation pages), the question for us is that how we can get the number of pages traversed by the GC. Earlier we added a way to count traversed bytes (#768), but it counts bytes not pages. We probably can calculate traversed pages from live pages and promoted pages.

@qinsoon qinsoon added C-bug Category: Bug A-heap Area: Heap (including Mmapper, VMMap) labels Aug 23, 2023
@wks
Copy link
Collaborator

wks commented Aug 24, 2023

Alternatively, we can count non-empty pages during the release stage. This can avoid introducing cost during the transitive closure.

@qinsoon
Copy link
Member Author

qinsoon commented Aug 24, 2023

We have reserved pages, which is non-empty pages. But that does not always equal to pages we traversed in a GC. There are memory that is alive and reserved, but is not traversed in a GC. Like unreachable objects in immortal spaces, or mature objects in a nursery GC.

@k-sareen
Copy link
Collaborator

k-sareen commented Aug 24, 2023

Can you point to the page in the paper which mentions it's traversed bytes and not collected bytes?

@qinsoon
Copy link
Member Author

qinsoon commented Aug 24, 2023

In Fig 4 of the paper, when it refers to 'bytes collected during collection', it means 'bytes traversed by the GC' rather than 'bytes freed by the GC' (we checked with the author).

@k-sareen
Copy link
Collaborator

Ah right. This is just a heuristic anyway, so maybe it's fine to just use live bytes instead of traversed bytes.

@qinsoon qinsoon moved this to 🔖 Ready in v0.20 (20230929) Aug 31, 2023
@qinsoon qinsoon added the P-normal Priority: Normal. label Oct 25, 2023
github-merge-queue bot pushed a commit that referenced this issue Jan 16, 2024
This PR fixes #918. We
misinterpreted 'bytes collected during collection' in the paper, and
used reclaimed pages for it. It actually means 'bytes that are traversed
during collection'. We now use live pages to estimate the value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-heap Area: Heap (including Mmapper, VMMap) C-bug Category: Bug P-normal Priority: Normal.
Projects
No open projects
Status: 🔖 Ready
Development

Successfully merging a pull request may close this issue.

3 participants