Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing-subscriber: big RAM usage #1005

Closed
vorot93 opened this issue Sep 30, 2020 · 2 comments · Fixed by #1062
Closed

tracing-subscriber: big RAM usage #1005

vorot93 opened this issue Sep 30, 2020 · 2 comments · Fixed by #1062
Assignees
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working

Comments

@vorot93
Copy link
Member

vorot93 commented Sep 30, 2020

fn main() {
    tracing_subscriber::fmt()
        .with_env_filter(
            tracing_subscriber::EnvFilter::from_default_env().add_directive("scratchpad=info".parse().unwrap()),
        )
        .init();

    std::thread::park();
}

This short program consumes 9 MB of residential memory on my Mac. It really shouldn't.

@vorot93 vorot93 added crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working labels Sep 30, 2020
@hawkw hawkw self-assigned this Sep 30, 2020
@hawkw
Copy link
Member

hawkw commented Oct 15, 2020

@vorot93 were you looking at virtual memory or RSS when you observed this issue? I think I have a potential fix, but while verifying it, I noticed that on my machine (running linux), your repro uses about 12kb resident set size, and 15mb of virtual memory. Is that in line with your observations on the Mac?

@vorot93
Copy link
Member Author

vorot93 commented Oct 15, 2020

For me it's 9MB residential and 4GB virtual.

hawkw added a commit to hawkw/sharded-slab that referenced this issue Oct 15, 2020
Currently, creating a new `Slab` allocates a shards array of `Shard`
structs. The `Shard` struct itself owns two boxed arrays of local and
shared metadata for each page on that shard. Even though we don't
allocate the actual storage arrays for those pages until they are
needed, allocating the shard metadata eagerly means that a completely
empty slab results in a fairly large memory allocation up front. This is
especially the case when used with the default `Config`, which (on
64-bit machines) allows up to 4096 threads. On a 64-bit machine, the
`Shared` page metadata is 4 words, so 32 bytes, and the `Local` metadata
is another word. 33 bytes * 32 pages per shard = 1056 bytes, which is a
little over 1kb per shard. This means that the default config eagerly
allocates 4096 shards * 1056 bytes is about 4mb of metadata, even when
the program only has one or two threads in it. and the remaining
4000-some possible threads will never allocate their shards.

When most of the shards are empty because there are very few threads in
the program, most of this allocated memory is not *resident*, and gets
paged out by the operating system, but it results in a very surprising
amount of allocated virtual memory. This is the cause of issues like
tokio-rs/tracing#1005.

Furthermore, allocating all of this means that actually _constructing_ a
slab takes a pretty long time. In `tracing-subscriber`, this is normally
not a major issue, since subscribers tend to be created on startup and
live for the entire lifetime of the program. However, in some use-cases,
like creating a separate subscriber for each test, the performance
impact of allocating all that metadata is quite significant. See, for
example:
rust-lang/rust-analyzer#5792 (comment)

This branch fixes this by allocating the shard metadata only when a new
shard is actually needed by a new thread. The shard array is now an
array of `AtomicPtr`s to shards, and shards are only allocated the first
time they are `insert`ed to. Since each thread can only insert to its
own shard, the synchronization logic for this is fairly simple. However,
since the shards are morally, although not actually, _owned_ by these
`AtomicPtr`s, there is the potential for leaks when a slab is dropped,
if we don't also ensure that all the shards it creates are also dropped.
Therefore, we use `loom::alloc::Track` for leak detection in tests.
Fortunately, the logic for ensuring these are deallocated is not too
complex.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit to hawkw/sharded-slab that referenced this issue Oct 15, 2020
Currently, creating a new `Slab` allocates a shards array of `Shard`
structs. The `Shard` struct itself owns two boxed arrays of local and
shared metadata for each page on that shard. Even though we don't
allocate the actual storage arrays for those pages until they are
needed, allocating the shard metadata eagerly means that a completely
empty slab results in a fairly large memory allocation up front. This is
especially the case when used with the default `Config`, which (on
64-bit machines) allows up to 4096 threads. On a 64-bit machine, the
`Shared` page metadata is 4 words, so 32 bytes, and the `Local` metadata
is another word. 33 bytes * 32 pages per shard = 1056 bytes, which is a
little over 1kb per shard. This means that the default config eagerly
allocates 4096 shards * 1056 bytes is about 4mb of metadata, even when
the program only has one or two threads in it. and the remaining
4000-some possible threads will never allocate their shards.

When most of the shards are empty because there are very few threads in
the program, most of this allocated memory is not *resident*, and gets
paged out by the operating system, but it results in a very surprising
amount of allocated virtual memory. This is the cause of issues like
tokio-rs/tracing#1005.

Furthermore, allocating all of this means that actually _constructing_ a
slab takes a pretty long time. In `tracing-subscriber`, this is normally
not a major issue, since subscribers tend to be created on startup and
live for the entire lifetime of the program. However, in some use-cases,
like creating a separate subscriber for each test, the performance
impact of allocating all that metadata is quite significant. See, for
example:
rust-lang/rust-analyzer#5792 (comment)

This branch fixes this by allocating the shard metadata only when a new
shard is actually needed by a new thread. The shard array is now an
array of `AtomicPtr`s to shards, and shards are only allocated the first
time they are `insert`ed to. Since each thread can only insert to its
own shard, the synchronization logic for this is fairly simple. However,
since the shards are morally, although not actually, _owned_ by these
`AtomicPtr`s, there is the potential for leaks when a slab is dropped,
if we don't also ensure that all the shards it creates are also dropped.
Therefore, we use `loom::alloc::Track` for leak detection in tests.
Fortunately, the logic for ensuring these are deallocated is not too
complex.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this issue Oct 22, 2020
## Motivation

hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard
metadata is allocated only when a new shard is created, rather than all
up front when the slab is created. This fixes the very large amount of
memory allocated by simply creating a new `Registry` without actually
collecting any traces.

## Solution

This branch updates `tracing-subscriber` to depend on `sharded-slab`
0.1.0, which includes the upstream fix.

In addition, this branch the registry from using `sharded_slab::Slab` to
`sharded_slab::Pool`. This allows us to clear hashmap allocations for
extensions in-place, retaining the already allocated maps. This should
improve `new_span` performance a bit.

Fixes #1005
hawkw added a commit that referenced this issue Oct 22, 2020
This backports #1062 to v0.1.6. This has already been approved on
master.

hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard
metadata is allocated only when a new shard is created, rather than all
up front when the slab is created. This fixes the very large amount of
memory allocated by simply creating a new `Registry` without actually
collecting any traces.

This branch updates `tracing-subscriber` to depend on `sharded-slab`
0.1.0, which includes the upstream fix.

In addition, this branch the registry from using `sharded_slab::Slab` to
`sharded_slab::Pool`. This allows us to clear hashmap allocations for
extensions in-place, retaining the already allocated maps. This should
improve `new_span` performance a bit.

Fixes #1005

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this issue Oct 22, 2020
This backports #1062 to v0.1.6. This has already been approved on
master.

hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard
metadata is allocated only when a new shard is created, rather than all
up front when the slab is created. This fixes the very large amount of
memory allocated by simply creating a new `Registry` without actually
collecting any traces.

This branch updates `tracing-subscriber` to depend on `sharded-slab`
0.1.0, which includes the upstream fix.

In addition, this branch the registry from using `sharded_slab::Slab` to
`sharded_slab::Pool`. This allows us to clear hashmap allocations for
extensions in-place, retaining the already allocated maps. This should
improve `new_span` performance a bit.

Fixes #1005

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
kaffarell pushed a commit to kaffarell/tracing that referenced this issue May 22, 2024
This backports tokio-rs#1062 to v0.1.6. This has already been approved on
master.

hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard
metadata is allocated only when a new shard is created, rather than all
up front when the slab is created. This fixes the very large amount of
memory allocated by simply creating a new `Registry` without actually
collecting any traces.

This branch updates `tracing-subscriber` to depend on `sharded-slab`
0.1.0, which includes the upstream fix.

In addition, this branch the registry from using `sharded_slab::Slab` to
`sharded_slab::Pool`. This allows us to clear hashmap allocations for
extensions in-place, retaining the already allocated maps. This should
improve `new_span` performance a bit.

Fixes tokio-rs#1005

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants