Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store aggregate read/execute count statistics #8286

Merged
merged 3 commits into from
Jul 2, 2024
Merged

Conversation

bgw
Copy link
Member

@bgw bgw commented Jun 4, 2024

Why?

I want to determine percent "cache hit" rates for tasks. Tasks with very low task hit rates should likely have their annotations removed.

Eventually, we might be able to use this information in a more automated way, by leaving the annotation in, but skipping the caching for low-cache-hit tasks.

What?

This implementation only logs persistent tasks, which should compromise all or the majority of tasks we care about for memory usage.

The implementation should bail out quickly if caching is disabled, so it should be okay to leave in release builds, which is important for making it easy to gather statistics from willing users.

Testing

Run included unit tests!

This is used as part of vercel/next.js@canary...bgw/cache-hit-stats

Copy link

vercel bot commented Jun 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
examples-nonmonorepo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 2, 2024 6:37pm
rust-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 2, 2024 6:37pm
8 Skipped Deployments
Name Status Preview Comments Updated (UTC)
examples-basic-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-designsystem-docs ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-gatsby-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-kitchensink-blog ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-native-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-svelte-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-tailwind-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm
examples-vite-web ⬜️ Ignored (Inspect) Visit Preview Jul 2, 2024 6:37pm

@bgw bgw changed the title Add helper module for storing and serializing statistics Add logic for optionally storing and serializing aggregate read/execute count statistics Jun 4, 2024
Copy link
Contributor

github-actions bot commented Jun 4, 2024

⚠️ This change may fail to build next-swc.

Logs

error: failed to select a version for `triomphe`.
    ... required by package `hstr v0.2.8`
    ... which satisfies dependency `hstr = "^0.2.8"` (locked to 0.2.8) of package `swc_atoms v0.6.7`
    ... which satisfies dependency `swc_atoms = "^0.6.5"` (locked to 0.6.7) of package `swc_core v0.95.4`
    ... which satisfies dependency `swc_core = "^0.95.4"` (locked to 0.95.4) of package `wasm v0.0.0 (/root/actions-runner/_work/turbo/turbo/packages/next-swc/crates/wasm)`
versions that meet the requirements `^0.1.11` (locked to 0.1.11) are: 0.1.11

all possible versions conflict with previously selected packages.

  previously selected package `triomphe v0.1.13`
    ... which satisfies dependency `triomphe = "^0.1.13"` of package `turbo-tasks v0.1.0 (https://github.com/vercel/turbo?rev=476427d7198747336185feb7d720d6798e886943#5aba12db)`
    ... which satisfies git dependency `turbo-tasks` (locked to 0.1.0) of package `next-build-test v0.1.0 (/root/actions-runner/_work/turbo/turbo/packages/next-swc/crates/next-build-test)`

failed to select a version for `triomphe` which could resolve this conflict

See job summary for details

Copy link
Contributor

github-actions bot commented Jun 4, 2024

🟢 Turbopack Benchmark CI successful 🟢

Thanks

Copy link
Contributor

github-actions bot commented Jun 4, 2024

⚠️ CI failed ⚠️

The following steps have failed in CI:

  • Turbopack Rust tests (mac/win, non-blocking)

See workflow summary for details

@bgw bgw changed the title Add logic for optionally storing and serializing aggregate read/execute count statistics Store aggregate read/execute count statistics Jun 4, 2024
@bgw bgw force-pushed the bgw/cache-hit-stats branch from 56b7767 to 2aa90d9 Compare June 6, 2024 06:05
@bgw bgw force-pushed the bgw/cache-hit-stats branch from 740eba3 to abcb446 Compare June 20, 2024 08:51
@bgw bgw changed the base branch from main to bgw/exit-handler June 20, 2024 08:51
@bgw bgw force-pushed the bgw/cache-hit-stats branch from abcb446 to 8d59e79 Compare June 20, 2024 14:22
@bgw bgw assigned kdy1 Jun 24, 2024
.cloned()
.expect("No arguments for trait call");
let stats = Arc::clone(stats);
turbo_tasks.run_once(Box::pin(async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could use a tokio::task::spawn instead. That's much more lightweight

Copy link
Member Author

@bgw bgw Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't do this because resolve_trait_method needs to be run within the context of turbo_tasks:

thread 'test_dyn_trait_methods' panicked at crates/turbo-tasks/src/manager.rs:1301:17:
cannot access a task-local storage value without setting it first

And it looks like passing through the task-local context isn't safe either: https://github.com/vercel/turbo/blob/main/crates/turbo-tasks/src/manager.rs#L1323-L1329

@bgw bgw force-pushed the bgw/cache-hit-stats branch from 7f34477 to 5c82e4d Compare July 2, 2024 18:34
@bgw bgw merged commit 99d5435 into main Jul 2, 2024
54 of 56 checks passed
Copy link
Member Author

bgw commented Jul 2, 2024

Merge activity

  • Jul 2, 3:03 PM EDT: @bgw merged this pull request with Graphite.

@bgw bgw deleted the bgw/cache-hit-stats branch July 2, 2024 19:03
bgw added a commit to vercel/next.js that referenced this pull request Jul 3, 2024
Tobias Koppers - fix typo (vercel/turborepo#8619)
Benjamin Woodruff - Store aggregate read/execute count statistics
(vercel/turborepo#8286)
Tobias Koppers - box InProgress task state (vercel/turborepo#8644)
Tobias Koppers - Task Edges Set/List (vercel/turborepo#8624)
Benjamin Woodruff - Memory: Use `triomphe::Arc` for `SharedReference`
(vercel/turborepo#8622)
Will Binns-Smith - chore: release npm packages (vercel/turborepo#8614)
Will Binns-Smith - devlow-bench: add git branch and sha to datapoints
(vercel/turborepo#8602)

---

Fixes a `triomphe` package version conflict between turbopack and swc by
bumping it from 0.1.11 to 0.1.13.
bgw added a commit to vercel/next.js that referenced this pull request Jul 9, 2024
…67164)

Writes out the statistics gathered via vercel/turborepo#8286 to a file.

Stats can be formatted and sorted with jq:

```
jq 'to_entries | sort_by(.value.cache_miss) | reverse | from_entries' <input.json >output.json
```

Output looks something like this (once formatted and sorted):
https://gist.github.com/bgw/4e2df35b9e410bf71fe51ecaffc3c7bf

Without #67165, this requires that SIGINT be sent directly to the
`next-server` process to allow a clean exit:

```
pkill -INT next-server
```

But with #67165, a simple <kbd>ctrl</kbd>+<kbd>c</kbd> works.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 25, 2024
…epo#8547)

## Why?

I need a better system for handling process exit to handle flushing to cache hit statistics (vercel/turborepo#8286).

## What?

- Supports letting another thing listen for <kbd>ctrl</kbd> + <kbd>c</kbd>. In the case of next-server, we need to let node.js own and configure these signal handlers.
- Supports setting up a <kbd>ctrl</kbd> + <kbd>c</kbd> handler for you, but now explicitly panics if you would try to set up multiple global <kbd>ctrl</kbd> + <kbd>c</kbd> listeners.
- Allows async work to happen during cleanup. This wasn't possible using `Drop` because `Drop` isn't async.
- Allows cleanup work to be scheduled after application initialization, potentially making the API a bit more flexible.

## Testing Instructions

```
cargo test -p turbopack-trace-utils
```

Add some `println!()` logging to the end of the `on_exit` future in turbopack-cli.

```
TURBOPACK_TRACING=overview cargo run -p turbopack-cli -- dev
```

Hit `ctrl+c` and see that my `println!()` runs, so I know the tracing flush finishes.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 25, 2024
### Why?

I want to determine percent "cache hit" rates for tasks. Tasks with very
low task hit rates should likely have their annotations removed.

Eventually, we might be able to use this information in a more automated
way, by leaving the annotation in, but skipping the caching for
low-cache-hit tasks.

### What?

This implementation only logs persistent tasks, which should compromise
all or the majority of tasks we care about for memory usage.

The implementation should bail out quickly if caching is disabled, so it
should be okay to leave in release builds, which is important for making
it easy to gather statistics from willing users.

### Testing

Run included unit tests!

This is used as part of
canary...bgw/cache-hit-stats
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
…epo#8547)

## Why?

I need a better system for handling process exit to handle flushing to cache hit statistics (vercel/turborepo#8286).

## What?

- Supports letting another thing listen for <kbd>ctrl</kbd> + <kbd>c</kbd>. In the case of next-server, we need to let node.js own and configure these signal handlers.
- Supports setting up a <kbd>ctrl</kbd> + <kbd>c</kbd> handler for you, but now explicitly panics if you would try to set up multiple global <kbd>ctrl</kbd> + <kbd>c</kbd> listeners.
- Allows async work to happen during cleanup. This wasn't possible using `Drop` because `Drop` isn't async.
- Allows cleanup work to be scheduled after application initialization, potentially making the API a bit more flexible.

## Testing Instructions

```
cargo test -p turbopack-trace-utils
```

Add some `println!()` logging to the end of the `on_exit` future in turbopack-cli.

```
TURBOPACK_TRACING=overview cargo run -p turbopack-cli -- dev
```

Hit `ctrl+c` and see that my `println!()` runs, so I know the tracing flush finishes.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
### Why?

I want to determine percent "cache hit" rates for tasks. Tasks with very
low task hit rates should likely have their annotations removed.

Eventually, we might be able to use this information in a more automated
way, by leaving the annotation in, but skipping the caching for
low-cache-hit tasks.

### What?

This implementation only logs persistent tasks, which should compromise
all or the majority of tasks we care about for memory usage.

The implementation should bail out quickly if caching is disabled, so it
should be okay to leave in release builds, which is important for making
it easy to gather statistics from willing users.

### Testing

Run included unit tests!

This is used as part of
canary...bgw/cache-hit-stats
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
…epo#8547)

## Why?

I need a better system for handling process exit to handle flushing to cache hit statistics (vercel/turborepo#8286).

## What?

- Supports letting another thing listen for <kbd>ctrl</kbd> + <kbd>c</kbd>. In the case of next-server, we need to let node.js own and configure these signal handlers.
- Supports setting up a <kbd>ctrl</kbd> + <kbd>c</kbd> handler for you, but now explicitly panics if you would try to set up multiple global <kbd>ctrl</kbd> + <kbd>c</kbd> listeners.
- Allows async work to happen during cleanup. This wasn't possible using `Drop` because `Drop` isn't async.
- Allows cleanup work to be scheduled after application initialization, potentially making the API a bit more flexible.

## Testing Instructions

```
cargo test -p turbopack-trace-utils
```

Add some `println!()` logging to the end of the `on_exit` future in turbopack-cli.

```
TURBOPACK_TRACING=overview cargo run -p turbopack-cli -- dev
```

Hit `ctrl+c` and see that my `println!()` runs, so I know the tracing flush finishes.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
### Why?

I want to determine percent "cache hit" rates for tasks. Tasks with very
low task hit rates should likely have their annotations removed.

Eventually, we might be able to use this information in a more automated
way, by leaving the annotation in, but skipping the caching for
low-cache-hit tasks.

### What?

This implementation only logs persistent tasks, which should compromise
all or the majority of tasks we care about for memory usage.

The implementation should bail out quickly if caching is disabled, so it
should be okay to leave in release builds, which is important for making
it easy to gather statistics from willing users.

### Testing

Run included unit tests!

This is used as part of
canary...bgw/cache-hit-stats
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 1, 2024
…epo#8547)

## Why?

I need a better system for handling process exit to handle flushing to cache hit statistics (vercel/turborepo#8286).

## What?

- Supports letting another thing listen for <kbd>ctrl</kbd> + <kbd>c</kbd>. In the case of next-server, we need to let node.js own and configure these signal handlers.
- Supports setting up a <kbd>ctrl</kbd> + <kbd>c</kbd> handler for you, but now explicitly panics if you would try to set up multiple global <kbd>ctrl</kbd> + <kbd>c</kbd> listeners.
- Allows async work to happen during cleanup. This wasn't possible using `Drop` because `Drop` isn't async.
- Allows cleanup work to be scheduled after application initialization, potentially making the API a bit more flexible.

## Testing Instructions

```
cargo test -p turbopack-trace-utils
```

Add some `println!()` logging to the end of the `on_exit` future in turbopack-cli.

```
TURBOPACK_TRACING=overview cargo run -p turbopack-cli -- dev
```

Hit `ctrl+c` and see that my `println!()` runs, so I know the tracing flush finishes.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 1, 2024
### Why?

I want to determine percent "cache hit" rates for tasks. Tasks with very
low task hit rates should likely have their annotations removed.

Eventually, we might be able to use this information in a more automated
way, by leaving the annotation in, but skipping the caching for
low-cache-hit tasks.

### What?

This implementation only logs persistent tasks, which should compromise
all or the majority of tasks we care about for memory usage.

The implementation should bail out quickly if caching is disabled, so it
should be okay to leave in release builds, which is important for making
it easy to gather statistics from willing users.

### Testing

Run included unit tests!

This is used as part of
canary...bgw/cache-hit-stats
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 14, 2024
…67164)

Writes out the statistics gathered via vercel/turborepo#8286 to a file.

Stats can be formatted and sorted with jq:

```
jq 'to_entries | sort_by(.value.cache_miss) | reverse | from_entries' <input.json >output.json
```

Output looks something like this (once formatted and sorted):
https://gist.github.com/bgw/4e2df35b9e410bf71fe51ecaffc3c7bf

Without #67165, this requires that SIGINT be sent directly to the
`next-server` process to allow a clean exit:

```
pkill -INT next-server
```

But with #67165, a simple <kbd>ctrl</kbd>+<kbd>c</kbd> works.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 15, 2024
…67164)

Writes out the statistics gathered via vercel/turborepo#8286 to a file.

Stats can be formatted and sorted with jq:

```
jq 'to_entries | sort_by(.value.cache_miss) | reverse | from_entries' <input.json >output.json
```

Output looks something like this (once formatted and sorted):
https://gist.github.com/bgw/4e2df35b9e410bf71fe51ecaffc3c7bf

Without #67165, this requires that SIGINT be sent directly to the
`next-server` process to allow a clean exit:

```
pkill -INT next-server
```

But with #67165, a simple <kbd>ctrl</kbd>+<kbd>c</kbd> works.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 16, 2024
Tobias Koppers - fix typo (vercel/turborepo#8619)
Benjamin Woodruff - Store aggregate read/execute count statistics
(vercel/turborepo#8286)
Tobias Koppers - box InProgress task state (vercel/turborepo#8644)
Tobias Koppers - Task Edges Set/List (vercel/turborepo#8624)
Benjamin Woodruff - Memory: Use `triomphe::Arc` for `SharedReference`
(vercel/turborepo#8622)
Will Binns-Smith - chore: release npm packages (vercel/turborepo#8614)
Will Binns-Smith - devlow-bench: add git branch and sha to datapoints
(vercel/turborepo#8602)

---

Fixes a `triomphe` package version conflict between turbopack and swc by
bumping it from 0.1.11 to 0.1.13.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 16, 2024
…67164)

Writes out the statistics gathered via vercel/turborepo#8286 to a file.

Stats can be formatted and sorted with jq:

```
jq 'to_entries | sort_by(.value.cache_miss) | reverse | from_entries' <input.json >output.json
```

Output looks something like this (once formatted and sorted):
https://gist.github.com/bgw/4e2df35b9e410bf71fe51ecaffc3c7bf

Without #67165, this requires that SIGINT be sent directly to the
`next-server` process to allow a clean exit:

```
pkill -INT next-server
```

But with #67165, a simple <kbd>ctrl</kbd>+<kbd>c</kbd> works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants