Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsdict simd feature fails to build with Rust nightly 1.78 #41

Closed
KonradHoeffner opened this issue Feb 20, 2024 · 16 comments
Closed

rsdict simd feature fails to build with Rust nightly 1.78 #41

KonradHoeffner opened this issue Feb 20, 2024 · 16 comments
Assignees
Labels
blocked waiting for external factors to be resolved, such as an updated dependency optimize reduce memory or CPU usage
Milestone

Comments

@KonradHoeffner
Copy link
Owner

KonradHoeffner commented Feb 20, 2024

The rsdict dependency "simd" feature depends on packed_simd, which is no longer available in nightly 1.78.
See sujayakar/rsdict#9 and rust-lang/packed_simd#359.
I tried to refactor rsdict but wasn't successful yet, see sujayakar/rsdict#10.

Options

  1. Disable the "simd" feature for now but that would decrease the speed.
  2. Warn users to stay at nightly 1.77 or lower until this is fixed at rsdict but is inconvenient and can cause errors for users who don't notice it.
  3. Add a "simd" feature to hdt and pass that through to rsdict.

Update: Option 1 was chosen but this issue is kept open in case of a future rsdict update that allows reenabling it.

@KonradHoeffner KonradHoeffner added the bug Something isn't working label Feb 20, 2024
@KonradHoeffner KonradHoeffner self-assigned this Feb 20, 2024
@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Feb 20, 2024

Removing the "simd" feature doesn't hurt performance that much, so disable it for now:

without rsdict "simd" feature

$ cargo bench --bench iai
    Finished bench [optimized] target(s) in 0.03s
     Running benches/iai.rs (target/release/deps/iai-14f5bcf861e0aa04)
load
  Instructions:            22425353 (-0.003193%)
  L1 Accesses:             28365288 (-0.364770%)
  L2 Accesses:                49159 (-0.366842%)
  RAM Accesses:               14752 (-0.027108%)
  Estimated Cycles:        29127403 (-0.358822%)

query_all
  Instructions:           100324003 (+0.032336%)
  L1 Accesses:            126508543 (-0.050547%)
  L2 Accesses:                75061 (+0.399936%)
  RAM Accesses:               14703 (-0.392927%)
  Estimated Cycles:       127398453 (-0.050614%)

query_all_sophia
  Instructions:           112690609 (+0.001438%)
  L1 Accesses:            144199404 (-0.079763%)
  L2 Accesses:               109364 (+14.81423%)
  RAM Accesses:               16972 (No change)
  Estimated Cycles:       145340244 (-0.030646%)

query_po
  Instructions:            26805066 (-0.000530%)
  L1 Accesses:             33790638 (-0.343344%)
  L2 Accesses:                50843 (+0.917012%)
  RAM Accesses:               14690 (-0.608931%)
  Estimated Cycles:        34559003 (-0.338151%)

query_po_sophia
  Instructions:            26961723 (-0.002533%)
  L1 Accesses:             34034329 (-0.342070%)
  L2 Accesses:                50251 (-1.889924%)
  RAM Accesses:               14821 (+0.850572%)
  Estimated Cycles:        34804319 (-0.335855%)

query_o
  Instructions:            27469497 (+0.007493%)
  L1 Accesses:             34687868 (-0.327179%)
  L2 Accesses:                50256 (-0.039780%)
  RAM Accesses:               14775 (-0.027065%)
  Estimated Cycles:        35456273 (-0.320784%)

query_o_sophia
  Instructions:            30578845 (+0.005282%)
  L1 Accesses:             38782526 (-0.294905%)
  L2 Accesses:                51323 (+2.247236%)
  RAM Accesses:               14684 (-0.864164%)
  Estimated Cycles:        39553081 (-0.286261%)

with simd feature

$ cargo bench -q --bench criterion "triple IDs"  -- --quick
??? (all)/0.1 all triple IDs
                        time:   [1.4357 s 1.4364 s 1.4366 s]
                        change: [-1.5371% -0.7365% +0.0767%] (p = 0.65 > 0.05)
                        No change in performance detected.

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [1.4638 µs 1.4642 µs 1.4643 µs]
                        change: [-5.4005% -3.4320% -1.3807%] (p = 0.08 > 0.05)
                        No change in performance detected.

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [335.56 ms 335.58 ms 335.67 ms]
                        change: [-0.2186% -0.1766% -0.1345%] (p = 0.07 > 0.05)
                        No change in performance detected.

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [143.02 ms 143.21 ms 144.00 ms]
                        change: [-3.9287% -2.9968% -2.0532%] (p = 0.07 > 0.05)
                        No change in performance detected.

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [116.37 ms 116.39 ms 116.48 ms]
                        change: [-4.1821% -2.4701% -0.6975%] (p = 0.08 > 0.05)
                        No change in performance detected.

without simd feature

hdt$ cargo bench -q --bench criterion "triple IDs"  -- --quick
??? (all)/0.1 all triple IDs
                        time:   [1.4435 s 1.4457 s 1.4463 s]
                        change: [+0.4779% +0.6080% +0.7381%] (p = 0.08 > 0.05)
                        No change in performance detected.

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [1.4537 µs 1.4547 µs 1.4584 µs]
                        change: [-0.7223% -0.5441% -0.3659%] (p = 0.05 < 0.05)
                        Change within noise threshold.

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [336.54 ms 336.79 ms 336.86 ms]
                        change: [+0.2587% +0.3235% +0.3883%] (p = 0.10 > 0.05)
                        No change in performance detected.

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [144.21 ms 144.31 ms 144.71 ms]
                        change: [+0.1414% +0.6598% +1.1818%] (p = 0.15 > 0.05)
                        No change in performance detected.

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [118.51 ms 118.57 ms 118.81 ms]
                        change: [+1.7408% +1.9189% +2.0971%] (p = 0.10 > 0.05)
                        No change in performance detected.

@donpellegrino
Copy link

Thanks for the quick analysis and fix! It looks like Option 1, "Disable the SIMD feature for now, but that would decrease the speed," is the way to go to get things compiling again. That would close this issue.

Would adding a separate new feature issue around SIMD to be addressed later be reasonable? Would some future performance improvement be expected at a larger scale of data, use of GPUs, or other SIMD accelerators? Or is the sense that SIMD does not do much algorithmically for HDT?

@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Feb 21, 2024

@donpellegrino thanks for the input! I disabled the SIMD feature in commit 225ff27 for now but I will keep this issue open because I suspect that rsdict will become updated in the near future as I already got it to compile with the new standard library SIMD feature and I expect that someone more knowledgeable about SIMD than me will be able to fix the errors with reasonable effort.

As for the potential impact of a general SIMD feature, that is hard for me to say. The benchmarks above lead me to believe that it does not do much and is best handled by using optimized external libraries like rsdict for rank and select, but I'm open to be positively surprised by a pull request by someone with SIMD expertise :-)

@KonradHoeffner KonradHoeffner added blocked waiting for external factors to be resolved, such as an updated dependency and removed bug Something isn't working labels Feb 21, 2024
@KonradHoeffner KonradHoeffner added this to the next milestone Feb 21, 2024
@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Apr 9, 2024

Update: @Cydhra was kind enough to fix the issues in the refactoring, so once @sujayakar merges sujayakar/rsdict#10, we could reenable the SIMD feature of rsdict.
If someone else had a pressing need for this they could use the rsdict fork of the pull request as a Git dependency in their Cargo.toml but at least for HDT the performance difference is so minuscule that it is not worth it.
Also it would be great to have the rsdict maintainer check this PR. While the tests are successful, I don't know if there are edge cases that aren't covered, so I don't want to be responsible for any bugs that may cause.

@Cydhra
Copy link

Cydhra commented Apr 11, 2024

I mean, since performance is a non-issue here, this likely won't matter, but there are several alternatives to rsdict which beat its performance even with simd enabled. The best options are likely sucds, succinct if you only require rank, and vers.
So if it ends up abandoned, those are the options.

I think sucds also offers a compressed bitmap (though I haven't tested its space requirements), so if you rely on the compression of rsdict that might work. If your vectors aren't sparse (so compression is ineffective anyway), vers probably offers the smallest memory footprint overall.

For concrete micro-benchmarks, refer to https://github.com/Cydhra/vers

@KonradHoeffner KonradHoeffner added the optimize reduce memory or CPU usage label Apr 11, 2024
@KonradHoeffner
Copy link
Owner Author

We already use sucds so if we could refactor to that this would reduce dependency count by one and if it faster then even better.
However last time I checked, sucds didn't seem to have an equivalent to https://docs.rs/rsdict/latest/rsdict/struct.RsDict.html#method.from_blocks, I will check again.

@KonradHoeffner
Copy link
Owner Author

I got rid of the rsdict dependency by switching to sucds following one of the options suggested by @Cydhra.
However it is in a temporary "sucds" branch until I've made more performance measurements.

With the iai benchmark (10k triples very small dataset):

load
  Instructions:            23788468 (+8.065720%)
  L1 Accesses:             32290918 (+15.70495%)
  L2 Accesses:                47687 (-3.586664%)
  RAM Accesses:               14803 (-0.074254%)
  Estimated Cycles:        33047458 (+15.25324%)

query_all
  Instructions:            85571508 (-14.12827%)
  L1 Accesses:            113483823 (-9.751437%)
  L2 Accesses:                69219 (-6.529019%)
  RAM Accesses:               14812 (-0.537201%)
  Estimated Cycles:       114348338 (-9.704091%)

query_all_sophia
  Instructions:            98162338 (-12.57198%)
  L1 Accesses:            131583903 (-8.550159%)
  L2 Accesses:                73267 (-13.50334%)
  RAM Accesses:               17086 (-0.227737%)
  Estimated Cycles:       132548248 (-8.530212%)

query_po
  Instructions:            27698979 (+5.002010%)
  L1 Accesses:             37528315 (+12.65199%)
  L2 Accesses:                48375 (-3.654650%)
  RAM Accesses:               14841 (-0.127860%)
  Estimated Cycles:        38289625 (+12.33688%)

query_po_sophia
  Instructions:            27869964 (+4.985088%)
  L1 Accesses:             37797603 (+12.56838%)
  L2 Accesses:                48595 (-3.289683%)
  RAM Accesses:               14887 (-0.414743%)
  Estimated Cycles:        38561623 (+12.25466%)

query_o
  Instructions:            28356069 (+4.880660%)
  L1 Accesses:             38418762 (+12.32590%)
  L2 Accesses:                48525 (-3.282707%)
  RAM Accesses:               14788 (-0.844844%)
  Estimated Cycles:        39178967 (+12.01738%)

query_o_sophia
  Instructions:            31452544 (+4.356829%)
  L1 Accesses:             42495065 (+10.99454%)
  L2 Accesses:                48468 (-3.667044%)
  RAM Accesses:               14879 (-0.073875%)
  Estimated Cycles:        43258170 (+10.75242%)

My suspicion is that the performance decrease in some of the tests is due to the cumbersome loading process like this:

    pub fn new(data: Vec<u64>) -> Self {
        let mut v = BitVector::new();                                                                               
        for d in data {let _ = v.push_bits(d as usize,64);}
        let dict = Rank9Sel::new(v).select1_hints();   
        Bitmap { dict }
    }

The loop calling push bits, which further applies some masking because it supports different numbers of bits, is probably much less efficient then just copying the usize blocks as was possible with rsdict.

@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Apr 11, 2024

Criterion benchmarks

The benchmarks show drastic performance gains, I will recheck it with https://github.com/KonradHoeffner/hdt_benchmark to make sure.

cargo bench --bench criterion

Queries persondata_en.hdt: 86MB, ~10M triples.

rsdict

hdt$ cargo bench --bench criterion           
    Updating crates.io index
   Compiling hdt v0.1.8 (/home/konrad/projekte/rust/hdt)
    Finished `bench` profile [optimized] target(s) in 3.56s
     Running benches/criterion.rs (target/release/deps/criterion-2ca20a72aa74cccc)
Benchmarking ??? (all)/0.1 all triple IDs: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 14.9s.
??? (all)/0.1 all triple IDs
                        time:   [1.4515 s 1.4601 s 1.4691 s]
                        change: [+441.80% +451.86% +462.47%] (p = 0.01 < 0.05)
                        Performance has regressed.
Benchmarking ??? (all)/0.2 all str triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 44.8s.
??? (all)/0.2 all str triples
                        time:   [4.3857 s 4.4032 s 4.4248 s]
                        change: [+31.960% +33.091% +34.267%] (p = 0.01 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking ??? (all)/0.3 all Sophia triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 51.2s.
??? (all)/0.3 all Sophia triples
                        time:   [5.1053 s 5.1108 s 5.1159 s]
                        change: [+26.964% +27.099% +27.222%] (p = 0.01 < 0.05)
                        Performance has regressed.

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [1.4701 µs 1.4711 µs 1.4722 µs]
                        change: [+673.36% +674.25% +675.14%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
S??/1.2 (vincent, ?, ?) str triples
                        time:   [3.3508 µs 3.3527 µs 3.3548 µs]
                        change: [+56.702% +56.925% +57.148%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
S??/1.3 (vincent, ?, ?) Sophia triples
                        time:   [3.7053 µs 3.7077 µs 3.7101 µs]
                        change: [+50.744% +50.985% +51.248%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [334.24 ms 334.37 ms 334.52 ms]
                        change: [+135.63% +135.73% +135.86%] (p = 0.01 < 0.05)
                        Performance has regressed.
Benchmarking ?P? 1429618 triples/2.2 (?, type, ?) str triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.7s.
?P? 1429618 triples/2.2 (?, type, ?) str triples
                        time:   [573.43 ms 573.77 ms 574.10 ms]
                        change: [+54.162% +54.340% +54.517%] (p = 0.01 < 0.05)
                        Performance has regressed.
Benchmarking ?P? 1429618 triples/2.3 (?, type, ?) Sophia triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.4s.
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
                        time:   [644.39 ms 644.61 ms 644.82 ms]
                        change: [+40.394% +40.750% +41.109%] (p = 0.01 < 0.05)
                        Performance has regressed.

Benchmarking ??O 1414214 triples/3.1 (?, ?, person) triple IDs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.3s, or reduce sample count to 30.
??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [143.18 ms 143.22 ms 143.26 ms]
                        change: [+332.33% +332.85% +333.39%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking ??O 1414214 triples/3.2 (?, ?, person) str triples: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 37.8s, or reduce sample count to 10.
??O 1414214 triples/3.2 (?, ?, person) str triples
                        time:   [381.55 ms 383.15 ms 384.86 ms]
                        change: [+48.805% +49.427% +50.223%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild
Benchmarking ??O 1414214 triples/3.3 (?, ?, person) Sophia triples: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 54.4s, or reduce sample count to 10.
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
                        time:   [538.96 ms 540.77 ms 542.66 ms]
                        change: [+31.487% +32.126% +32.751%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking ?PO 1414214 triples/4.1 (?, type, person) triple IDs: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.7s or enable flat sampling.
?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [117.15 ms 117.34 ms 117.69 ms]
                        change: [+1613.6% +1640.1% +1668.3%] (p = 0.01 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
?PO 1414214 triples/4.2 (?, type, person) str subjects
                        time:   [321.38 ms 324.29 ms 327.42 ms]
                        change: [+67.818% +71.746% +75.719%] (p = 0.01 < 0.05)
                        Performance has regressed.
?PO 1414214 triples/4.3 (?, type, person) str triples
                        time:   [341.45 ms 344.35 ms 347.60 ms]
                        change: [+63.109% +64.445% +65.892%] (p = 0.00 < 0.05)
                        Performance has regressed.
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
                        time:   [341.47 ms 344.28 ms 347.53 ms]
                        change: [+52.392% +53.589% +54.991%] (p = 0.01 < 0.05)
                        Performance has regressed.

sucds

hdt$ cargo bench --bench criterion                                                                                                                      sucds
   Compiling hdt v0.1.8 (/home/konrad/projekte/rust/hdt)
    Finished `bench` profile [optimized] target(s) in 1.48s
     Running benches/criterion.rs (target/release/deps/criterion-fd17cbde06dbe57c)
??? (all)/0.1 all triple IDs
                        time:   [260.05 ms 260.55 ms 261.19 ms]
                        change: [-82.274% -82.156% -82.043%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking ??? (all)/0.2 all str triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 32.6s.
??? (all)/0.2 all str triples
                        time:   [3.2486 s 3.2712 s 3.3039 s]
                        change: [-26.349% -25.709% -24.941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Benchmarking ??? (all)/0.3 all Sophia triples: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 40.6s.
??? (all)/0.3 all Sophia triples
                        time:   [4.0177 s 4.0434 s 4.0729 s]
                        change: [-21.383% -20.885% -20.285%] (p = 0.00 < 0.05)
                        Performance has improved.

S??/1.1 (vincent, ?, ?) triple IDs
                        time:   [188.85 ns 189.34 ns 189.87 ns]
                        change: [-87.181% -87.156% -87.131%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
S??/1.2 (vincent, ?, ?) str triples
                        time:   [2.1806 µs 2.1861 µs 2.1914 µs]
                        change: [-35.051% -34.935% -34.809%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
S??/1.3 (vincent, ?, ?) Sophia triples
                        time:   [2.4561 µs 2.4595 µs 2.4631 µs]
                        change: [-33.836% -33.719% -33.604%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking ?P? 1429618 triples/2.1 (?, type, ?) triple IDs: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.8s or enable flat sampling.
?P? 1429618 triples/2.1 (?, type, ?) triple IDs
                        time:   [141.50 ms 142.14 ms 143.02 ms]
                        change: [-57.648% -57.514% -57.313%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
?P? 1429618 triples/2.2 (?, type, ?) str triples
                        time:   [371.99 ms 372.23 ms 372.48 ms]
                        change: [-35.182% -35.126% -35.066%] (p = 0.00 < 0.05)
                        Performance has improved.
?P? 1429618 triples/2.3 (?, type, ?) Sophia triples
                        time:   [437.91 ms 438.16 ms 438.53 ms]
                        change: [-32.076% -32.026% -31.962%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

??O 1414214 triples/3.1 (?, ?, person) triple IDs
                        time:   [33.168 ms 33.187 ms 33.207 ms]
                        change: [-76.843% -76.828% -76.812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Benchmarking ??O 1414214 triples/3.2 (?, ?, person) str triples: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 25.6s, or reduce sample count to 10.
??O 1414214 triples/3.2 (?, ?, person) str triples
                        time:   [254.84 ms 254.94 ms 255.04 ms]
                        change: [-33.761% -33.463% -33.181%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking ??O 1414214 triples/3.3 (?, ?, person) Sophia triples: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 40.5s, or reduce sample count to 10.
??O 1414214 triples/3.3 (?, ?, person) Sophia triples
                        time:   [403.83 ms 404.04 ms 404.26 ms]
                        change: [-25.550% -25.285% -25.028%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

?PO 1414214 triples/4.1 (?, type, person) triple IDs
                        time:   [6.7001 ms 6.7055 ms 6.7112 ms]
                        change: [-94.355% -94.304% -94.261%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
?PO 1414214 triples/4.2 (?, type, person) str subjects
                        time:   [187.13 ms 187.77 ms 188.38 ms]
                        change: [-42.686% -42.100% -41.537%] (p = 0.00 < 0.05)
                        Performance has improved.
?PO 1414214 triples/4.3 (?, type, person) str triples
                        time:   [211.85 ms 212.37 ms 212.98 ms]
                        change: [-38.921% -38.327% -37.777%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
?PO 1414214 triples/4.4 (?, type, person) Sophia triples
                        time:   [229.08 ms 229.35 ms 229.65 ms]
                        change: [-34.005% -33.385% -32.825%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

@Cydhra
Copy link

Cydhra commented Apr 11, 2024

Oh wow, how is that even possible?

Rank9Sel has about the same overhead as RsDict, so this can't be cache effects. But removing the simd feature had almost no effect, so why does the increased performance suddenly make such a drastic change?

Would you mind pushing that branch, I am now personally interested how Vers would perform :)

Edit: The benchmarks don't come with their resource :(

@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Apr 11, 2024

Yeah I'm always skeptical with such large changes, so I will try to run it again using the hdt_benchmark suite.
Feel free to test it with vers :-)
The benchmark test resources can be generated either with this Makefile or manually:

wget http://downloads.dbpedia.org/2016-10/core-i18n/en/persondata_en.ttl.bz2
bunzip2 persondata_en.ttl.bz2
docker run -v ${PWD}:/data rdfhdt/hdt-cpp rdf2hdt /data/persondata_en.ttl /data/persondata_en.hdt

The reason they are not in the repo is that I didn't want to waste disk space for the people behind crates.io, maybe I find a better solution in the future.

P.S.: Tried to attach it here but GitHub allows 25 MB max.
P.P.S.: You can also download it directly at https://github.com/KonradHoeffner/hdt/releases/download/benchmarkdata/persondata_en.hdt.bz2, see the readme.

@Cydhra
Copy link

Cydhra commented Apr 11, 2024

I have similar results with vers like with sucds. A few percent gained in some benchmarks, a few lost in others. I also tried a method that just moves the data into the bitvec instead of copying the bits in a loop, but that didn't seem to make a difference, so I wouldn't worry about it.

@KonradHoeffner
Copy link
Owner Author

@Cydhra: Thanks for investigating this! Did you try the iai or the criterion benchmark? I'm not sure why they show so much difference, maybe I should change the IAI benchmark to use a larger file as well, as this is what I'm optimizing for.

@KonradHoeffner
Copy link
Owner Author

KonradHoeffner commented Apr 12, 2024

@Cydhra I created some comparative benchmark curves at https://github.com/KonradHoeffner/rdf_benchmark/blob/master/hdt_rs_current_vs_previous_versions.ipynb.
I am now convinced that it is indeed measurably faster both in loading and querying, thank you for all the help!
Oh and on top you can now use hdt_rs on stable Rust :-)

query
query2
parse

@Cydhra
Copy link

Cydhra commented Apr 12, 2024

Did you try the iai or the criterion benchmark?

Criterion.

@sujayakar
Copy link

sorry for not responding earlier -- I've been busy with work and haven't had much time for side projects.

nice work benchmarking btw, and sucds is indeed a fantastic library! we're actually using it at my startup for its compressed integer vectors and elias fano implementation: https://github.com/get-convex/convex-backend/blob/addaff23baa77b886d3736252615b149895d2ff4/crates/search/src/memory_index/term_list.rs#L121

@KonradHoeffner
Copy link
Owner Author

Thank you! And no problem, I fully understand :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked waiting for external factors to be resolved, such as an updated dependency optimize reduce memory or CPU usage
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants