Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further tune translate_cds code #83

Closed
holtgrewe opened this issue Apr 3, 2023 · 3 comments · Fixed by #85
Closed

Further tune translate_cds code #83

holtgrewe opened this issue Apr 3, 2023 · 3 comments · Fixed by #85

Comments

@holtgrewe
Copy link
Contributor

holtgrewe commented Apr 3, 2023

A lot of time is apparently spent in accessing the lazy_static data structures through the lock.

@holtgrewe
Copy link
Contributor Author

This reduces running time by another ~2/3.

image

@holtgrewe
Copy link
Contributor Author

The flamegraph tells us that the code is now mostly in the table lookup. Looking at the annotated source code via perf report (see below) shows that nothing stands out. The red 5.54% correspond to the u8 as usize conversion that cannot be really helped, I guess.

image

For the record, here is how to use ad-hoc pprof-rs for generating flamegraphs. This is the most reliable way of getting flamegraphs (cargo flamegraph somehow did not work very well) which also allows to create flamegraphs for selected parts of the program only.

diff --git a/Cargo.toml b/Cargo.toml
index 0035412..26196fa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -39,7 +39,11 @@ pretty_assertions = "1.3.0"
 rstest = "0.17.0"
 test-log = "0.2.11"
 criterion = "0.3"
+pprof = { version = "0.11.1", features = ["flamegraph", "cpp"] }
 
 [[bench]]
 name = "translate_cds"
 harness = false
+
+[profile.release]
+debug = true
diff --git a/benches/translate_cds.rs b/benches/translate_cds.rs
index b1c9175..8899687 100644
--- a/benches/translate_cds.rs
+++ b/benches/translate_cds.rs
@@ -18,9 +18,14 @@ lazy_static::lazy_static! {
 }
 
 fn criterion_benchmark(c: &mut Criterion) {
+    let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).blocklist(&["libc", "libgcc", "pthread", "vdso"]).build().unwrap();
     c.bench_function("translate_cds TTN", |b| {
         b.iter(|| translate_cds(&SEQ_TTN, true, "*", TranslationTable::Standard).unwrap())
     });
+    if let Ok(report) = guard.report().build() {
+        let file = std::fs::File::create("flamegraph.svg").unwrap();
+        report.flamegraph(file).unwrap();
+    };
 }
 
 criterion_group!(benches, criterion_benchmark);

@holtgrewe
Copy link
Contributor Author

Annotating 100k variants in mehari goes down from 24.15s to 15.5s (end-to-end running time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant