Further tune translate_cds code #83

holtgrewe · 2023-04-03T11:27:10Z

A lot of time is apparently spent in accessing the lazy_static data structures through the lock.

The text was updated successfully, but these errors were encountered:

holtgrewe · 2023-04-04T06:54:43Z

This reduces running time by another ~2/3.

holtgrewe · 2023-04-04T07:08:20Z

The flamegraph tells us that the code is now mostly in the table lookup. Looking at the annotated source code via perf report (see below) shows that nothing stands out. The red 5.54% correspond to the u8 as usize conversion that cannot be really helped, I guess.

For the record, here is how to use ad-hoc pprof-rs for generating flamegraphs. This is the most reliable way of getting flamegraphs (cargo flamegraph somehow did not work very well) which also allows to create flamegraphs for selected parts of the program only.

diff --git a/Cargo.toml b/Cargo.toml
index 0035412..26196fa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -39,7 +39,11 @@ pretty_assertions = "1.3.0"
 rstest = "0.17.0"
 test-log = "0.2.11"
 criterion = "0.3"
+pprof = { version = "0.11.1", features = ["flamegraph", "cpp"] }
 
 [[bench]]
 name = "translate_cds"
 harness = false
+
+[profile.release]
+debug = true
diff --git a/benches/translate_cds.rs b/benches/translate_cds.rs
index b1c9175..8899687 100644
--- a/benches/translate_cds.rs
+++ b/benches/translate_cds.rs
@@ -18,9 +18,14 @@ lazy_static::lazy_static! {
 }
 
 fn criterion_benchmark(c: &mut Criterion) {
+    let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).blocklist(&["libc", "libgcc", "pthread", "vdso"]).build().unwrap();
     c.bench_function("translate_cds TTN", |b| {
         b.iter(|| translate_cds(&SEQ_TTN, true, "*", TranslationTable::Standard).unwrap())
     });
+    if let Ok(report) = guard.report().build() {
+        let file = std::fs::File::create("flamegraph.svg").unwrap();
+        report.flamegraph(file).unwrap();
+    };
 }
 
 criterion_group!(benches, criterion_benchmark);

holtgrewe · 2023-04-04T07:12:18Z

Annotating 100k variants in mehari goes down from 24.15s to 15.5s (end-to-end running time).

holtgrewe added a commit that referenced this issue Apr 4, 2023

perf: further speeding up translate_cds code (#83)

fda8e1b

holtgrewe linked a pull request Apr 4, 2023 that will close this issue

perf: further speeding up translate_cds code (#83) #85

Merged

holtgrewe closed this as completed in #85 Apr 4, 2023

holtgrewe added a commit that referenced this issue Apr 4, 2023

perf: further speeding up translate_cds code (#83) (#85)

60b071d

github-actions bot mentioned this issue Apr 4, 2023

chore(main): release 0.5.2 #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further tune translate_cds code #83

Further tune translate_cds code #83

holtgrewe commented Apr 3, 2023 •

edited

Loading

holtgrewe commented Apr 4, 2023

holtgrewe commented Apr 4, 2023

holtgrewe commented Apr 4, 2023

Further tune translate_cds code #83

Further tune translate_cds code #83

Comments

holtgrewe commented Apr 3, 2023 • edited Loading

holtgrewe commented Apr 4, 2023

holtgrewe commented Apr 4, 2023

holtgrewe commented Apr 4, 2023

holtgrewe commented Apr 3, 2023 •

edited

Loading