Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow performance in debug mode with default backend #297

Closed
edmorley opened this issue Mar 28, 2022 · 4 comments
Closed

Extremely slow performance in debug mode with default backend #297

edmorley opened this issue Mar 28, 2022 · 4 comments

Comments

@edmorley
Copy link

edmorley commented Mar 28, 2022

Hi!

In a particular project, I use flate2 to decompress a ~50MB gzipped tarfile.

Whilst in production the project will be built in release mode, the integration tests are performed using debug builds, and when iterating locally when developing, I use debug builds too.

In addition, due to the nature of the project (a Cloud Native Buildpack that's targetting x86_64 Linux), these integration tests/any manual testing have to run inside a x86_64 Docker container. After recently obtaining a new Macbook Pro M1 Max (which has to use Docker's qemu emulation for x86_64 Docker images), I was surprised to see the integration tests take considerably longer than they used to on my much older machine.

Investigating, it turns out that when using the default flate2 backend of miniz_oxide and the below testcase:

  • debug builds are ~30x slower than release builds, when run on ARM64 natively
  • debug builds are ~60x slower than release builds, when run under Docker's qemu emulation

In contrast, when using the zlib or zlib-ng-compat backends, debug builds are only 2-4x slower than release builds.

Whilst debug builds are expected to be slower than release builds, I was quite surprised that they were 30-60x slower for this crate using the default backend.

I'm presuming there's not much that can be done to improve performance of miniz_oxide for debug builds, however I was wondering if it would be worth mentioning the performance issues in this crates docs, particularly given that:
(a) switching backends makes such a difference here,
(b) the docs currently suggest that the default backend is mostly "good enough" (and otherwise I would have tried another backend sooner):

There’s various tradeoffs associated with each implementation, but in general you probably won’t have to tweak the defaults.

(from https://docs.rs/flate2/latest/flate2/#implementation)

It was only later that I noticed this section in the readme (that's not on docs.rs), that seemed to imply the zlib-ng backend was actually faster:
https://github.com/rust-lang/flate2-rs#backends

Testcase:

use flate2::read::GzDecoder;
use std::fs::File;

fn main() -> Result<(), std::io::Error> {
    // Archive is from:
    // https://heroku-buildpack-python.s3.amazonaws.com/heroku-20/runtimes/python-3.10.3.tar.gz
    let archive = File::open("python-3.10.3.tar.gz")?;
    let mut destination = tempfile::tempfile()?;
    let mut decoder = GzDecoder::new(archive);
    std::io::copy(&mut decoder, &mut destination)?;

    Ok(())
}
[package]
name = "testcase-flate2-debug"
version = "0.1.0"
edition = "2021"

[dependencies]
# For default backend
flate2 = "1.0.22"
# For alternate backends
# flate2 = { version = "1.0.22", features = ["zlib-ng-compat"], default-features = false }
# flate2 = { version = "1.0.22", features = ["zlib"], default-features = false }
tempfile = "3.3.0"

Results:

Backend Architecture Wall time w/release build Wall time w/debug build Debug slowdown
miniz_oxide (default) Native ARM64 0.69s 21.55s 31x
miniz_oxide (default) AMD64 under qemu 3.41s 207s 60x
zlib Native ARM64 0.65s 1.26s 1.9x
zlib AMD64 under qemu 2.19s 9.22s 4.2x
zlib-ng-compat Native ARM64 0.55s 1.43s 2.6x
zlib-ng-compat AMD64 under qemu ??? ??? ???

(The missing timings for zlib-ng-compat under qemu is due to cross-compilation of zlib-ng currently failing: rust-lang/libz-sys#93)

@oyvindln
Copy link
Contributor

Yeah rust in debug mode is going to be much much much slower than anything written in C due to the nature of the languages. (and I'm not sure whether system zlib will even be used in debug/no optimization mode.)

Turning on the first level of optimizations in debug mode may help a fair bit, may be some other workarounds to avoid compiling all deps in debug mode or using different opts for main project/deps but not sure.

@edmorley
Copy link
Author

edmorley commented Mar 28, 2022

I wasn't able to get perf working inside a QEMU'd Docker container (due to PERF_FLAG_FD_CLOEXEC not implemented errors), so wasn't able to profile the 207s chronic case unfortunately.

However, this is a flamegraph for a native ARM64 debug build (the 21.55s entry in the table above):
(It has to be downloaded for the interactivity to work; hosted on GitHub that is disabled)

flamegraph-debug-native-arm64

As can be seen, 77% of the profile is in Adler32::compute():
https://github.com/jonas-schievink/adler/blob/a94f525f62698d699d1fb3cc9112db8c35662b16/src/algo.rs#L5-L107

With 60% of the total profile within the implementation of AddAssign<Self> for U32X4 (used from Adler32::compute()):
https://github.com/jonas-schievink/adler/blob/a94f525f62698d699d1fb3cc9112db8c35662b16/src/algo.rs#L124-L130

@messense
Copy link

messense commented May 3, 2022

You can override opt-level for certain crates in debug mode, see https://doc.rust-lang.org/cargo/reference/profiles.html#overrides, add the following to Cargo.toml should make it faster.

[profile.dev.package.miniz_oxide]
opt-level = 3

@JohnTitor
Copy link
Member

Closing as this is more of a Rust issue rather than a flate2-specific one.

edmorley added a commit to heroku/libcnb.rs that referenced this issue Jul 12, 2023
The `flate2` crate supports multiple backends for performing
decompression:
https://github.com/rust-lang/flate2-rs#backends

The default `miniz_oxide` flate2 backend has poor performance in debug
builds/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to
cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is the
`zlib` backend.

(This is the backend that the Python CNB already uses.)

GUS-W-13745779.
edmorley added a commit to heroku/libcnb.rs that referenced this issue Jul 12, 2023
The `flate2` crate supports multiple backends for performing
decompression:
https://github.com/rust-lang/flate2-rs#backends

The default `miniz_oxide` flate2 backend has poor performance in debug
builds/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to
cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is the
`zlib` backend.

(This is the backend that the Python CNB already uses.)

GUS-W-13745779.
edmorley added a commit to heroku/buildpacks-ruby that referenced this issue Mar 1, 2024
The default `miniz_oxide` flate2 backend has poor performance in debug/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is `zlib`.

This makes the `flate2` usage in this repo consistent with `libherokubuildpack`,
Python CNB, PHP CNB etc.
edmorley added a commit to heroku/buildpacks-ruby that referenced this issue Mar 1, 2024
* Bump the rust-dependencies group with 5 updates

Bumps the rust-dependencies group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [tempfile](https://github.com/Stebalien/tempfile) | `3.9.0` | `3.10.1` |
| [thiserror](https://github.com/dtolnay/thiserror) | `1.0.55` | `1.0.57` |
| [ureq](https://github.com/algesten/ureq) | `2.9.1` | `2.9.6` |
| [clap](https://github.com/clap-rs/clap) | `4.4.18` | `4.5.1` |
| [toml](https://github.com/toml-rs/toml) | `0.8.9` | `0.8.10` |


Updates `tempfile` from 3.9.0 to 3.10.1
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](Stebalien/tempfile@v3.9.0...v3.10.1)

Updates `thiserror` from 1.0.55 to 1.0.57
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](dtolnay/thiserror@1.0.55...1.0.57)

Updates `ureq` from 2.9.1 to 2.9.6
- [Changelog](https://github.com/algesten/ureq/blob/main/CHANGELOG.md)
- [Commits](algesten/ureq@2.9.1...2.9.6)

Updates `clap` from 4.4.18 to 4.5.1
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](clap-rs/clap@v4.4.18...clap_complete-v4.5.1)

Updates `toml` from 0.8.9 to 0.8.10
- [Commits](toml-rs/toml@toml-v0.8.9...toml-v0.8.10)

---
updated-dependencies:
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: rust-dependencies
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: ureq
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: rust-dependencies
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Refresh lockfile

* Sort deps in Cargo.toml alphabetically

* Make libcnb-test version specifier consistent

* Remove unnecessary lint suppression

Since the import is now used in the library itself.

* Disable unused Cargo features

* Switch to the `zlib` flate2 backend

The default `miniz_oxide` flate2 backend has poor performance in debug/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is `zlib`.

This makes the `flate2` usage in this repo consistent with `libherokubuildpack`,
Python CNB, PHP CNB etc.

* Add TODO about regex vs fancy-regex

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
edmorley added a commit to heroku/buildpacks-jvm that referenced this issue Mar 1, 2024
The default `miniz_oxide` flate2 backend has poor performance in debug/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is `zlib`.

This makes the `flate2` usage in this repo consistent with `libherokubuildpack`,
Python CNB, PHP CNB etc.
edmorley added a commit to heroku/buildpacks-jvm that referenced this issue Mar 1, 2024
* Bump the rust-dependencies group with 6 updates

Bumps the rust-dependencies group with 6 updates:

| Package | From | To |
| --- | --- | --- |
| [serde](https://github.com/serde-rs/serde) | `1.0.196` | `1.0.197` |
| [tempfile](https://github.com/Stebalien/tempfile) | `3.9.0` | `3.10.1` |
| [ureq](https://github.com/algesten/ureq) | `2.9.1` | `2.9.6` |
| [thiserror](https://github.com/dtolnay/thiserror) | `1.0.56` | `1.0.57` |
| [toml](https://github.com/toml-rs/toml) | `0.8.9` | `0.8.10` |
| [semver](https://github.com/dtolnay/semver) | `1.0.21` | `1.0.22` |


Updates `serde` from 1.0.196 to 1.0.197
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.196...v1.0.197)

Updates `tempfile` from 3.9.0 to 3.10.1
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](Stebalien/tempfile@v3.9.0...v3.10.1)

Updates `ureq` from 2.9.1 to 2.9.6
- [Changelog](https://github.com/algesten/ureq/blob/main/CHANGELOG.md)
- [Commits](algesten/ureq@2.9.1...2.9.6)

Updates `thiserror` from 1.0.56 to 1.0.57
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](dtolnay/thiserror@1.0.56...1.0.57)

Updates `toml` from 0.8.9 to 0.8.10
- [Commits](toml-rs/toml@toml-v0.8.9...toml-v0.8.10)

Updates `semver` from 1.0.21 to 1.0.22
- [Release notes](https://github.com/dtolnay/semver/releases)
- [Commits](dtolnay/semver@1.0.21...1.0.22)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: rust-dependencies
- dependency-name: ureq
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
- dependency-name: semver
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: rust-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Refresh lockfile

* Disable unused default Cargo features

* Switch to the `zlib` flate2 backend

The default `miniz_oxide` flate2 backend has poor performance in debug/under QEMU:
rust-lang/flate2-rs#297

Ideally we'd use the fastest `zlib-ng` backend, however it fails to cross-compile:
rust-lang/libz-sys#93

As such we have to use the next best alternate backend, which is `zlib`.

This makes the `flate2` usage in this repo consistent with `libherokubuildpack`,
Python CNB, PHP CNB etc.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants