Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cargo clean --workspace #14720

Open
xxchan opened this issue Oct 23, 2024 · 6 comments
Open

Support cargo clean --workspace #14720

xxchan opened this issue Oct 23, 2024 · 6 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-clean S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.

Comments

@xxchan
Copy link
Contributor

xxchan commented Oct 23, 2024

Problem

We have a large workspace (~500k loc). target dir bloats quickly, but cargo clean will let us recompile all dependencies (We have >1000 dependencies, and some dependencies take minutes to compile).

Recompiling dependencies is unnecessary, and it wastes time and energy.

We only want to clean build artifacts of "our own code". Actually it also takes up most of the space:

1.1G   │ ┌── risingwave
1.4G   │ ├── build
343M   │ │   ┌── s-h14437pi5v-13pt08j-dwag002k0abj4ylcfstlvyjeo
343M   │ │ ┌─┴ risingwave_stream-2qi109ipc00fa
361M   │ │ │ ┌── s-h1442qartc-12uu8de-1oe9soksc9wjl6inun2bsr3ts
361M   │ │ ├─┴ risingwave_expr_impl-2y2nvlr6oosas
384M   │ │ │ ┌── s-h1443fexwf-1wq0ljy-0ixglmmttspe0nfoa0e2nexap
384M   │ │ ├─┴ risingwave_frontend-00fb3jeuoq12s
3.4G   │ ├─┴ incremental
320M   │ │ ┌── librisingwave_connector-59acd07e4d1dce73.rlib
381M   │ │ ├── librisingwave_storage-26c2772e948fe4df.rlib
430M   │ │ ├── librisingwave_meta-e1e3edc30da6a88b.rlib
601M   │ │ ├── librisingwave_frontend-5241cdc2e280b9c9.rlib
1.1G   │ │ ├── librisingwave_stream-0896a913c629f6c3.rlib
1.1G   │ │ ├── risingwave-393af56edf3487b4
 47G   │ ├─┴ deps
 53G   ├─┴ debug
 70G ┌─┴ target

Current workarounds

Crates in my workspace have a common prefix, like risingwave-xxx. So I could just find & delete:

> find target -name '*risingwave*' -type d -exec rm -rf {} + -o -name '*risingwave*' -delete

A more cargo native solution (mentioned by @weihanglo):

cargo metadata | \
  jq -r '.workspace_members[]' | \
  xargs -n1 cargo clean -p

But this turns out to be much slower than the brute force way above. @weihanglo says: we should make -p in cargo-clean also aware of glob syntax. Then it might be faster. But I'm thinking sth like --workspace-members is more straightforward, and I'm not sure whether there's other cases we want a regexp clean.

Proposed Solution

Add a flag --workspace-members for cargo clean, which is essentially

cargo metadata | \
  jq -r '.workspace_members[]' | \
  xargs -n1 cargo clean -p

(but more efficient)

Notes

Original posted at https://x.com/yayale_umi/status/1848921059011268944

@xxchan xxchan added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Oct 23, 2024
@epage
Copy link
Contributor

epage commented Oct 23, 2024

For what to name the flag, we generally call it --workspace. Most commands that have that flag have a different default package selection policy. However, we've already broken from that with cargo update and cargo clean would closely match cargo updates behavior, so maybe the precedence is strong enough to not worry about.

With the cargo update --workspace precedence, personally I don't have too much of a concern with adding this flag. @weihanglo thoughts?

However, I'm not too clear on the use case. Why do you need to clear things out? I'm assuming running the next build will cause these files to just come back, meaning there isn't a significant savings but the next build will be slower (because of the lack of incremental build cache).

Also, I wonder if there is a way for the incremental cache to be smaller. I didn't see any existing open issues for the size

@epage epage added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. and removed S-triage Status: This issue is waiting on initial triage. labels Oct 23, 2024
@weihanglo
Copy link
Member

weihanglo commented Oct 23, 2024

I don't have too much of a concern with adding this flag. @weihanglo thoughts?

I don't either, though might be worth considering the interaction with per-user caches #5931 together.

But this turns out to be much slower than the brute force way above

Just note that cargo clean has rooms to improve its performance #10552.

@xxchan
Copy link
Contributor Author

xxchan commented Oct 24, 2024

However, I'm not too clear on the use case. Why do you need to clear things out?

I remember the space can keep going up to like 100 or 200G. After the clean, the space can go down a lot. I can come back later to show what takes up the space when it bloats again.

@xxchan xxchan changed the title Support cargo clean --workspace-members Support cargo clean --workspace Oct 24, 2024
@xxchan
Copy link
Contributor Author

xxchan commented Oct 29, 2024

Update about the motivation:
Here we can see target/debug bloated from 53G to 160G, and most of the space is taken by deps (instead of incremental), which seems to contain multiple copies of the lib.

> dust target/debug --no-percent-bars
1.1G   ┌── risingwave                          
1.7G   ├── build
 33G   ├── incremental
431M   │ ┌── librisingwave_meta-e1e3edc30da6a88b.rlib
431M   │ ├── librisingwave_meta-6ce8ff9a028925c4.rlib
596M   │ ├── librisingwave_frontend-70f7a1eca352e8b3.rlib
598M   │ ├── librisingwave_frontend-0919931cb5871639.rlib
600M   │ ├── librisingwave_frontend-5241cdc2e280b9c9.rlib
603M   │ ├── librisingwave_frontend-33c6b4256c2d1366.rlib
603M   │ ├── librisingwave_frontend-5b4cbef1e446fb94.rlib
604M   │ ├── librisingwave_frontend-b94bacdba99adcb7.rlib
661M   │ ├── risingwave_stream-efca7a89d40c9acb
1.1G   │ ├── librisingwave_stream-0896a913c629f6c3.rlib
1.1G   │ ├── librisingwave_stream-3226d3c244877ed6.rlib
1.1G   │ ├── librisingwave_stream-7ade0b9a950a6a60.rlib
1.1G   │ ├── librisingwave_stream-626b097fc7b0fa93.rlib
1.1G   │ ├── risingwave-393af56edf3487b4
1.1G   │ ├── risingwave-db2dc932131a258f
1.1G   │ ├── risingwave-1df79d8641d47cff
123G   ├─┴ deps
160G ┌─┴ debug

After the delete, it goes down to 63G (Although still larger than original 53G)

> find target -name '*risingwave*' -type d -exec rm -rf {} + -o -name '*risingwave*' -delete                                                                
> dust target/debug --no-percent-bars
194M   ┌── sqlsmith-reducer                  
202M   ├── sqlsmith
1.4G   ├── build
1.8G   ├── incremental
115M   │ ┌── libdeltalake_core-0fa7243c683d6812.rlib
115M   │ ├── libdeltalake_core-0a2cedc34fa89d57.rlib
115M   │ ├── libdeltalake_core-9cdb697f0519b26c.rlib
115M   │ ├── libdeltalake_core-22c1682567a42244.rlib
128M   │ ├── integration_tests-64e64fbdd2ab965e
194M   │ ├── sqlsmith_reducer-5e38db86f5aa2b3c
202M   │ ├── sqlsmith-56bf4e30d0e5cfac
230M   │ ├── integration_tests-8d725fbcaec7a6ad
244M   │ ├── libaws_sdk_glue-665d21f53545fa46.rlib
244M   │ ├── libaws_sdk_glue-20c48d486ec80194.rlib
264M   │ ├── libaws_sdk_glue-4b9ecaacda2bf868.rlib
264M   │ ├── libaws_sdk_glue-4f5bb304b621b665.rlib
264M   │ ├── libaws_sdk_glue-c20b1d959b5948bb.rlib
264M   │ ├── libaws_sdk_glue-375afdf7d6546b9c.rlib
265M   │ ├── libaws_sdk_glue-2355e0e0072a444f.rlib
265M   │ ├── libaws_sdk_glue-78c375ccff556a0a.rlib
265M   │ ├── libaws_sdk_glue-201293aff23d2225.rlib
265M   │ ├── libaws_sdk_glue-36d64f38322d7b29.rlib
327M   │ ├── planner_test_runner-6205d542a9d99581
331M   │ ├── planner_test_runner-8b9873a9fb5557e4
 59G   ├─┴ deps
 63G ┌─┴ debug

@xxchan
Copy link
Contributor Author

xxchan commented Oct 29, 2024

What I really want is sth like "clean all earlier builds, but keep recent builds". It seems to be the goal of cargo-sweep, but it doesn't work perfectly (holmgr/cargo-sweep#11).

Therefore, I came up with cargo clean --workspace as a proxy goal.

@epage
Copy link
Contributor

epage commented Oct 29, 2024

We've talked about GC for the target directory which might be able to be tuned for "all but latest". The current layout of the target directory is a bit messy and I'd prefer we don't try to track individual files. We have talked about re-organizing the target directory for #5931 which would also give us GC for the shared intermediate artifavts but we could also consider non-shared ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-clean S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.
Projects
None yet
Development

No branches or pull requests

3 participants