Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

steven_blocks: could not compiler under windows? memory allocation of 15728640 bytes failed, STATUS_STACK_BUFFER_OVERRUN #184

Closed
woodgear opened this issue Jun 17, 2019 · 13 comments
Labels
bug Something isn't working

Comments

@woodgear
Copy link

       Fresh mime_guess v2.0.0-alpha.6
       Fresh rsa_public_encrypt_pkcs1 v0.2.0
       Fresh reqwest v0.9.18
memory allocation of 15728640 bytes failed
error: Could not compile `steven_blocks`.

Caused by:
  process didn't exit successfully: `C:\Users\wucong\.cargo\bin\sccache.exe rustc --edition=2018 --crate-name steven_blocks blocks\src\lib.rs --color always --crate-type lib --emit=dep-info,link -C opt-level=1 -C debuginfo=2 -C debug-assertions=on -C metadata=fff764ec044c94f8 -C extra-filename=-fff764ec044c94f8 --out-dir C:\Users\wucong\work\lab\stevenarella\target\debug\deps -C incremental=C:\Users\wucong\work\lab\stevenarella\target\debug\incremental -L dependency=C:\Users\wucong\work\lab\stevenarella\target\debug\deps --extern cgmath=C:\Users\wucong\work\lab\stevenarella\target\debug\deps\libcgmath-df2dce11f9bb8253.rlib --extern collision=C:\Users\wucong\work\lab\stevenarella\target\debug\deps\libcollision-7b0ee92f28b81d67.rlib --extern lazy_static=C:\Users\wucong\work\lab\stevenarella\target\debug\deps\liblazy_static-7fa98efeb9d42cb9.rlib --extern steven_shared=C:\Users\wucong\work\lab\stevenarella\target\debug\deps\libsteven_shared-fe5e6bf3fccae278.rlib` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

what should i do?

@iceiix
Copy link
Owner

iceiix commented Jun 17, 2019

This looks similar to the issue I was hitting in #174, unfortunately steven_blocks requires a lot of memory to compile (I've seen it peak at 3.3 GB) because of the recursive macro expansion.

@woodgear How much memory do you have? I test building on a 32 GB macOS system, and build on SourceHut (Linux) and AppVeyor (Windows), but not sure of their memory limits.

But I agree this excessive memory requirement is kind of ridiculous (steven_blocks also takes the longest time to build by far), would like to fix it but have yet to find how to improve it, may require rewriting the blocks macro system.

@iceiix
Copy link
Owner

iceiix commented Dec 30, 2019

15728640 bytes = 15 megabytes. I'm now hitting a similar problem on the same machine with Rust 1.40 (#253), not a memory allocation failure, but an unreasonably long build time (2.5+ hr, 25+ GB).

This high memory usage in steven_blocks is a longstanding problem, certainly should be addressed. May have to rewrite the whole blocks system to use fewer macros, or is there a way it could be optimized to build reasonably quickly? This will only get worse once more blocks are added, and we are behind. But steven_blocks has became quite complex, I'm not sure what to do here, may need assistance from someone with more expertise with Rust macros…

@iceiix iceiix added the bug Something isn't working label Dec 30, 2019
@iceiix iceiix changed the title could not compiler under windows? memory allocation of 15728640 bytes failed, STATUS_STACK_BUFFER_OVERRUN steven_blocks: could not compiler under windows? memory allocation of 15728640 bytes failed, STATUS_STACK_BUFFER_OVERRUN Dec 30, 2019
@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

Updating to Rust version 1.40 #253 (comment), time cargo build --release takes 4 hours 12 minutes (!).

The bottleneck seems to be in release optimization. Testing to isolate the problem, on nightly with -Z time passes, completes in about 2 minutes. No optimization, stable, also about less than 2 minutes. Returning to nightly and effectively using --release with -Z time-passes: blocks $ cargo clean ; time cargo rustc -- -C 'opt-level=3' -Z time-passes — still building, 25+ minutes, 4.5GB+ and counting…

time: 0.002	codegen passes [4gqa868n29dtlfbb]
time: 0.652	LTO passes
time: 0.075	codegen passes [473etbgnzq2x8af8]
time: 3.476	LTO passes
time: 4.218	LTO passes
time: 4.255	codegen passes [22fyvf1uiha0mrri]
time: 4.234	codegen passes [6ffwyc85r8r5jgh]
time: 63.064	LTO passes
Building [======================================================>  ] 48/49: steven_blocks   

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

https://internals.rust-lang.org/t/rust-staticlibs-and-optimizing-for-size/5746/4 mentions https://bugzilla.mozilla.org/show_bug.cgi?id=1386371
Disable Rust LTO by default in local builds, in Firefox.

Rust LTO is very slow and cause significant build time regressions, but Rust LTO's run-time performance speedup is small. Michael Woerister says disabling Rust LTO "would basically eliminate the build time for gkrust" (the Stylo crate).

Is this still true, or how much would steven_blocks benefit form LTO, is it causing the 125x slowdown or another optimization?

https://doc.rust-lang.org/nightly/rustc/codegen-options/index.html#opt-level

0: no optimizations, also turns on cfg(debug_assertions).
1: basic optimizations
2: some optimizations
3: all optimizations
s: optimize for binary size
z: optimize for binary size, but also turn off loop vectorization.

Worth noting, steven itself has this optimization profile in Config.toml:

[profile.dev]
# Steven runs horrendously slow with no optimizations, and often freezes.
# However, building with full -O3 optimizations takes too long for a debug build.
# Use an -O1 optimization level strikes a good compromise between build and program performance.
opt-level = 1

should steven_blocks have its Cargo.toml specify profile.release opt-level=1 (or 2)?

Still waiting on the clean time cargo rustc -- -C 'opt-level=3' -Z time-passes build, in blocks/, going on 37 minutes and 6.5GB+, if it takes as long as last time expect 4+ hours.

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

Completed in 5 hours 9 minutes this time:

   time: 0.002	codegen passes [4gqa868n29dtlfbb]
    time: 0.652	LTO passes
    time: 0.075	codegen passes [473etbgnzq2x8af8]
    time: 3.476	LTO passes
    time: 4.218	LTO passes
    time: 4.255	codegen passes [22fyvf1uiha0mrri]
    time: 4.234	codegen passes [6ffwyc85r8r5jgh]
    time: 63.064	LTO passes
    time: 15574.417	codegen passes [2w3r8ppd44eh9kz5]
  time: 15707.329	LLVM passes
  time: 0.003	serialize work products
  time: 0.746	linking
time: 15771.815		total
    Finished dev [unoptimized + debuginfo] target(s) in 263m 19s

real	309m45.295s
user	198m44.124s
sys	14m54.290s

15574.417 is spent on "codegen passes [2w3r8ppd44eh9kz5]". Other hits for 2w3r8ppd44eh9kz5:

    time: 3.984	llvm function passes [2w3r8ppd44eh9kz5]
      time: 2.733	persist query result cache
          time: 0.020	getting serialized graph
          time: 0.024	encoding serialized graph
        time: 0.044	encode dep-graph
      time: 0.070	persist dep-graph
    time: 2.803	serialize dep graph
  time: 5.799	codegen
    time: 63.088	llvm module passes [2w3r8ppd44eh9kz5]

15771.815 seconds = 4.38 hours on "codegen passes". This again points to codegen options.

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

cargo clean ; time cargo rustc -- -C 'opt-level=0' -Z time-passes

    Finished dev [unoptimized + debuginfo] target(s) in 1m 41s
cargo clean ; time cargo rustc -- -C 'opt-level=1' -Z time-passes

  time: 29.178	LLVM passes
  time: 0.000	serialize work products
  time: 0.164	linking
time: 97.822		total
    Finished dev [unoptimized + debuginfo] target(s) in 2m 05s
cargo clean ; time cargo rustc -- -C 'opt-level=2' -Z time-passes

waiting 12+ minutes, 1.48 GB+ and growing. Killed. update: rerunning, 3.7+ hours, 20+ GB, before I killed it again. Whatever optimization pass is slow, it occurs during opt-level=2.

Try this: https://users.rust-lang.org/t/improve-compile-time-and-executable-size-by-counting-lines-of-llvm-ir/14203 https://github.com/dtolnay/cargo-llvm-lines

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

blocks $ cargo llvm-lines
warning: file found to be present in multiple build targets: steven/src/main.rs
   Compiling steven_shared v0.0.1 (steven/shared)
   Compiling steven_blocks v0.0.1 (steven/blocks)
    Finished dev [unoptimized + debuginfo] target(s) in 1m 19s
 151828    2  core::ops::function::FnOnce::call_once
  43954  488  core::option::Option<T>::map
  24252  896  core::ptr::real_drop_in_place
  19669    1  steven_blocks::Block::get_model
  15834   39  alloc::raw_vec::RawVec<T,A>::reserve_internal
  14674    1  steven_blocks::Block::get_flat_offset
  12548    1  steven_blocks::Block::get_hierarchical_data
  12362    1  steven_blocks::Block::get_collision_boxes
  11195    1  steven_blocks::Block::get_model_variant
   9678    1  <steven_blocks::Block as core::fmt::Debug>::fmt
   6394   46  <<steven_blocks::VANILLA_ID_MAP as core::ops::deref::Deref>::deref::__static_ref_initialize::CombinationIter<facing> as core::iter::traits::iterator::Iterator>::next
   6305    1  steven_blocks::Block::get_material
   5976    1  steven_blocks::Block::match_multipart
   5642   14  <<steven_blocks::VANILLA_ID_MAP as core::ops::deref::Deref>::deref::__static_ref_initialize::CombinationIter<facing,half,shape,waterlogged> as core::iter::traits::iterator::Iterator>::next
   5075    1  steven_blocks::Block::get_tint
   5004   36  core::slice::<impl [T]>::copy_from_slice
   4990    1  steven_blocks::Block::get_modid
   4736   37  alloc::raw_vec::RawVec<T,A>::allocate_in
   4698    9  <<steven_blocks::VANILLA_ID_MAP as core::ops::deref::Deref>::deref::__static_ref_initialize::CombinationIter<north,south,west,east,waterlogged> as core::iter::traits::iterator::Iterator>::next
   4680   39  core::result::Result<T,E>::map_err
   4679    1  steven_blocks::Block::get_internal_id
   4600   40  core::result::Result<T,E>::map
   3885   37  core::iter::traits::exact_size::ExactSizeIterator::len
   3838   38  core::result::Result<T,E>::unwrap_or_else
   3591    7  <<steven_blocks::VANILLA_ID_MAP as core::ops::deref::Deref>::deref::__static_ref_initialize::CombinationIter<facing,half,hinge,open,powered> as core::iter::traits::iterator::Iterator>::next
...

get_model, get_flat_offset, get_hierarchical_data, get_collision_boxes, get_model_variant are among the largest in terms of IR lines. All these functions match *self, example:

            #[allow(unused_variables)]
            pub fn get_model(&self) -> (String, String) {
                match *self {
                    $(
                        Block::$name {
                            $($fname,)?
                        } => {
                            let parts = $model;
                            (String::from(parts.0), String::from(parts.1))
                        }
                    )+
                }
            }

Can attributes be used to control optimization? Maybe not, only code generation attributes are: inline, cold, no_builtins, target_feature. Leaning towards reducing the optimization level in the release profile. At least opt-level=1 is reasonably fast, only about 24 seconds slower than opt-level=0.

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

steven_blocks builds fast in cd blocks && cargo build --release, but not at the top-level (going on 20+ minutes, 3.12 GB). Why is blocks/Cargo.toml profile.release (and even [dependencies], given #256) not used from the top-level Cargo.toml with [dependencies.steven_gl] path = "./gl"?

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections

Cargo supports custom configuration of how rustc is invoked through profiles at the top level. Any manifest may declare a profile, but only the top level package’s profiles are actually read. All dependencies’ profiles will be overridden. This is done so the top-level package has control over how its dependencies are compiled.

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

rust-lang/cargo#1359 added an option to optimize just the dependencies, through config profiles: https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#config-profiles (.cargo/config), which take precedence over manifest files (Cargo.toml). However it is only on nightly. Looks like it will be stabilized in 1.41: https://github.com/rust-lang/cargo/pull/7591/files

@iceiix
Copy link
Owner

iceiix commented Jan 4, 2020

Added config profile, testing with cargo +nightly build --release -Z config-profile. Works in the blocks subdirectory (~2 min), testing on top-level, going on 30+ min, 4.5+ GB, the profile may not be taking effect...

@iceiix
Copy link
Owner

iceiix commented Jan 5, 2020

https://users.rust-lang.org/t/5-hours-to-compile-macro-what-can-i-do/36508

profile-overrides works, though requires nightly until Rust 1.41, according to https://blog.rust-lang.org/2020/01/03/reducing-support-for-32-bit-apple-targets.html "Rust 1.41.0 (to be released on January 30th, 2020)".

Maybe there is a better way to fix this.

@iceiix iceiix closed this as completed in e04ee87 Jan 5, 2020
iceiix added a commit that referenced this issue Jan 5, 2020
* Remove all blocks

rm -rf target/release/deps/steven_blocks-* ; time cargo build --release

0.188s

* Restore all blocks

* Enable macro expansion tracing

Requires nightly, `rustup default nightly`
The define_blocks! macro expands in about 9 seconds:

note: trace_macro
    --> blocks/src/lib.rs:486:1
     |
486  | / define_blocks! {
487  | |     Air {
488  | |         props {},
489  | |         material material::Material {
...    |
5601 | |     }
5602 | | }
     | |_^
     |
     = note: expanding `define_blocks! { Air
[............]
note: trace_macro
   --> blocks/src/lib.rs:930:17
    |
930 |         variant format!("extended={},facing={}", extended, facing.as_string()),
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: expanding `format! { "extended={},facing={}", extended, facing . as_string () }`
    = note: to `$crate :: fmt :: format
            ($crate :: __export :: format_args !
             ("extended={},facing={}", extended, facing . as_string ()))`

^C  Building [=====================================================> ] 317/320: steven_blocks

real	0m9.936s
user	0m5.460s
sys	0m0.444s

* Remove block dependencies

* Remove only all blocks

* Restore some blocks, finishes in about 2s

Removes 1000 lines from bottom

real	1m43.476s
user	4m14.319s
sys	0m8.194s

* steven_blocks: update lazy_static to 1.4.0

* cargo update steven_blocks

* Revert "Restore some blocks, finishes in about 2s"

This reverts commit 04aef6a.

* Revert "Remove only all blocks", -Z time-passes shows 85s

This reverts commit 5e951f5.

time cargo rustc -- -Z time-passes

  time: 13.535	item-bodies checking
  time: 26.370	metadata encoding and writing
  time: 28.031	MIR borrow checking
  time: 3.550	LLVM passes
time: 85.748		total

* Revert "Remove block dependencies", finishes in ~2m

real	1m52.989s
user	2m18.885s
sys	0m8.025s
blocks $ cargo clean ; time cargo rustc -- -Z time-passes

This reverts commit 32655a3.

* Restore all dependencies with git diff master src

* Revert "Enable macro expansion tracing", 1m49s

This reverts commit 7461615.

* Revert cargo update, but fails: cannot find function `pthread_atfork` in crate `libc`

error[E0425]: cannot find function `pthread_atfork` in crate `libc`
   --> .cargo/registry/src/github.com-1ecc6299db9ec823/rand-0.6.5/src/rngs/adapter/reseeding.rs:320:28
    |
320 |             unsafe { libc::pthread_atfork(None, None, Some(fork_handler)) };
    |                            ^^^^^^^^^^^^^^ not found in `libc`

   Compiling quote v0.6.10
   Compiling syn v0.15.23
error: aborting due to previous error

For more information about this error, try `rustc --explain E0425`.
error: could not compile `rand`.
warning: build failed, waiting for other jobs to finish...
error: build failed

real	0m7.047s
user	0m20.843s
sys	0m2.494s

steven_blocks v0.0.1 (steven/blocks)
├── cgmath v0.17.0
│   ├── approx v0.3.2
│   │   └── num-traits v0.2.6
│   ├── num-traits v0.2.6 (*)
│   ├── rand v0.6.5
│   │   ├── libc v0.2.9

* Revert "Revert cargo update, but fails: cannot find function `pthread_atfork` in crate `libc`"

This reverts commit 326dd2a.

* steven_blocks: reduce optimizations in release, closes #184

* Set top-level opt-level=1, compiles in ~5 min

* Revert "Set top-level opt-level=1, compiles in ~5 min"

This reverts commit 391041e.

* Add config profile for steven_blocks

* Use profile-overrides feature instead of config profile

* Update to nightly Rust until 1.41 (1/30/2020)

* Remove comment

* Update builds.sr.ht to use +nightly

* .build.yml: install nightly

* Update to rustc 1.42.0-nightly (760ce94c6 2020-01-04)

* Replace into_iter() -> iter() for arrays, fixes 1.42-nightly warning

warning: this method call currently resolves to `<&[T; N] as IntoIterator>::into_iter` (due to autoref coercions), but that might change in the future when `IntoIterator` impls for arrays are added.
   --> src/entity/player.rs:363:11
    |
363 |         ].into_iter().enumerate() {
    |           ^^^^^^^^^ help: use `.iter()` instead of `.into_iter()` to avoid ambiguity: `iter`
    |
    = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
    = note: for more information, see issue #66145 <rust-lang/rust#66145>

* Remove deprecated Error description, replaced by Display

warning: use of deprecated item 'std::error::Error::description': use the Display impl or to_string()
   --> src/protocol/mod.rs:981:40
    |
981 |             Error::IOError(ref e) => e.description(),
    |                                        ^^^^^^^^^^^
    |
    = note: `#[warn(deprecated)]` on by default

* Remove cargo-feature because profile-overrides is stable in nightly

warning: the cargo feature `profile-overrides` is now stable and is no longer necessary to be listed in the manifest
iceiix added a commit that referenced this issue Jan 5, 2020
@iceiix
Copy link
Owner

iceiix commented Jan 5, 2020

From https://wiki.alopex.li/TheStateOfGGEZ2020:

https://wiki.alopex.li/WhereRustcSpendsItsTime

iceiix added a commit that referenced this issue Jan 10, 2020
Improves fix for #184, whereas #255 reduced optimizations,
we now address the underlying compiler limitation and split out
the one massive lazy_static! initialization function, into
one function per block in the block_registration_functions module.

Previous build time, with opt-level=1:

% time cargo build --release
   Compiling steven_blocks v0.0.1
    Finished release [optimized] target(s) in 21.24s
cargo build --release  31.80s user 0.71s system 152% cpu 21.276 total

With this change, opt-level=3 and the function splitting fix:

% time cargo build --release
   Compiling steven_blocks v0.0.1
    Finished release [optimized] target(s) in 30.80s
cargo build --release  40.26s user 0.86s system 133% cpu 30.850 total

Full optimizations are expectedly slightly slower, but this is still
much much _much_ faster than before this refactoring, where this crate
would take up to an unbelievable 5 hours (and tens of GB of RAM). Long
story short, we're now back to full optimizations and stable Rust.
iceiix added a commit that referenced this issue Jan 10, 2020
Improves fix for #184, whereas #255 reduced optimizations,
we now address the underlying compiler limitation and split out
the one massive lazy_static! initialization function, into
one function per block in the block_registration_functions module.

Previous build time, with opt-level=1:

% time cargo build --release
   Compiling steven_blocks v0.0.1
    Finished release [optimized] target(s) in 21.24s
cargo build --release  31.80s user 0.71s system 152% cpu 21.276 total

With this change, opt-level=3 and the function splitting fix:

% time cargo build --release
   Compiling steven_blocks v0.0.1
    Finished release [optimized] target(s) in 30.80s
cargo build --release  40.26s user 0.86s system 133% cpu 30.850 total

Full optimizations are expectedly slightly slower, but this is still
much much _much_ faster than before this refactoring, where this crate
would take up to an unbelievable 5 hours (and tens of GB of RAM). Long
story short, we're now back to full optimizations and stable Rust.

Thanks to dtolnay on the Rust programming language forum for suggesting
this technique, https://users.rust-lang.org/t/5-hours-to-compile-macro-what-can-i-do/36508/2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants