Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: make performance great again #439

Merged
merged 27 commits into from
Jan 20, 2025
Merged

perf: make performance great again #439

merged 27 commits into from
Jan 20, 2025

Conversation

Chronostasys
Copy link
Member

@Chronostasys Chronostasys commented Dec 7, 2024

This is a massive performance update.

With this update, the ray trace program is about 20x faster than before, we even beat the golang implementation on my computer (MacBook Pro M2 13-inch 16GB).

Main changes:

  • Update our escape pass so that it can find more stack allocable variables, reduce GC pressure. It now can rewrite function's arguments from heap pointers to stack pointers.
  • Update our optimize pipeline (mainly O3 pipeline), the new pipeline will run escape pass a few times combined with other passes.
  • Rewrite GC C api interfaces, thus we can use llvm's native thread local implementation, which is proved to reduce a lot of overhead comparing to Rust's thread local.
  • Reimplement GC malloc hot path in llvm ir, so that most heap allocation won't have ffi overhead, and the hot path itself can be optimize by LLVM.
  • Rewrite some of GC logic to achieve better performance.

Bugfixes:

  • Fixed a bug in match statement which may cause it treat every type as generic type.
  • Fixed wrong ir code was generated in the new if let ... impl ... statement if the type isn't satisfy the condition.
  • Fixed some trait implemented methods may have wrong scope modifier.
  • Fixed a bug causing GC stuck thread wait for more 100ms before reaching safepoint.

TODOS:

  • The JIT & REPL does not support native thread local, so this commit actually break them. Need to find a way to walk around.
  • The heap alloc hot path still have some room for optimization.
  • We can use chase_lev to make GC mark load of each thread more balance.

Copy link

codecov bot commented Dec 7, 2024

Codecov Report

Attention: Patch coverage is 77.40586% with 54 lines in your changes missing coverage. Please review.

Project coverage is 84.87%. Comparing base (2f6044d) to head (d9c863b).
Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
src/ast/compiler.rs 55.55% 20 Missing ⚠️
vm/src/lib.rs 0.00% 11 Missing ⚠️
vm/src/mutex/mod.rs 0.00% 6 Missing ⚠️
src/ast/builder/llvmbuilder.rs 93.50% 5 Missing ⚠️
src/ast/node/node_result.rs 68.75% 5 Missing ⚠️
src/ast/plmod.rs 50.00% 5 Missing ⚠️
src/ast/ctx/cast.rs 85.71% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #439      +/-   ##
==========================================
- Coverage   85.05%   84.87%   -0.18%     
==========================================
  Files          99       99              
  Lines       26147    26390     +243     
==========================================
+ Hits        22239    22399     +160     
- Misses       3908     3991      +83     
Files with missing lines Coverage Δ
src/ast/ctx/builtins.rs 55.88% <100.00%> (+0.65%) ⬆️
src/ast/node/cast.rs 94.14% <100.00%> (+<0.01%) ⬆️
src/ast/node/control.rs 92.57% <100.00%> (+0.09%) ⬆️
src/ast/node/function.rs 91.76% <100.00%> (+0.01%) ⬆️
src/ast/node/implement.rs 90.76% <100.00%> (+0.11%) ⬆️
src/ast/node/interface.rs 86.17% <100.00%> (+0.07%) ⬆️
src/ast/pltype.rs 86.48% <100.00%> (-0.04%) ⬇️
src/ast/test.rs 95.46% <100.00%> (+7.61%) ⬆️
src/ast/ctx/cast.rs 84.61% <85.71%> (+0.09%) ⬆️
src/ast/builder/llvmbuilder.rs 91.75% <93.50%> (-0.02%) ⬇️
... and 5 more

... and 8 files with indirect coverage changes

@Chronostasys Chronostasys changed the title Perf/gc perf: make performance great again Dec 7, 2024
@Chronostasys Chronostasys force-pushed the perf/gc branch 16 times, most recently from 8eba8d4 to b214e95 Compare December 19, 2024 17:56
@Chronostasys Chronostasys force-pushed the perf/gc branch 3 times, most recently from 6f27e1b to 6e46f1a Compare December 19, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant