Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) results #228

Closed
zamazan4ik opened this issue Aug 15, 2023 · 2 comments
Closed

Profile-Guided Optimization (PGO) results #228

zamazan4ik opened this issue Aug 15, 2023 · 2 comments

Comments

@zamazan4ik
Copy link

Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with xml-rs.

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are here(with a lot of other PGO-related information). That's why I tried to optimize xml-rs with PGO too.

Test setup

My test setup is:

  • Macbook M1 Pro
  • macOS Ventura 13.4
  • Rustc version: rustc 1.73.0-nightly (180dffba1 2023-08-14)
  • xml-rs version: c6331c97ab9f487c9d0bce52c06364116f5e80d2 commit from the master branch

Benchmarks

As a benchmark, I used built-in into the xml-rs crate benchmarks. For PGO optimization I use cargo-pgo.

Results

Release:

test read            ... bench:      22,293 ns/iter (+/- 516)
test read_lots_attrs ... bench:     222,601 ns/iter (+/- 7,186)
test write           ... bench:       5,073 ns/iter (+/- 91)

Release + PGO:

test read            ... bench:      18,589 ns/iter (+/- 231)
test read_lots_attrs ... bench:     166,501 ns/iter (+/- 3,857)
test write           ... bench:       4,439 ns/iter (+/- 48)

Instrumented:

test read            ... bench:      32,521 ns/iter (+/- 771)
test read_lots_attrs ... bench:     299,387 ns/iter (+/- 11,805)
test write           ... bench:       8,157 ns/iter (+/- 243)

As you see, PGO allows parsing XML faster.

@kornelski
Copy link
Collaborator

I'm not surprised it helps a lot, the parser is architected the way that spans multiple function calls and enums per byte.

But as a library author I can't do anything with this information. It's something that end users need to enable.

@kornelski kornelski closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2023
@zamazan4ik
Copy link
Author

But as a library author I can't do anything with this information. It's something that end users need to enable.

Actually, you can write a note about this kind of performance improvement for the library somewhere in the documentation (even a note in README is completely ok). In this case, the users will be aware of how to improve performance of the library with this optimization technique even with actual numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants