Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unique Rc<str> for SongEntries #66

Closed
fsktom opened this issue Aug 6, 2023 · 7 comments
Closed

Use unique Rc<str> for SongEntries #66

fsktom opened this issue Aug 6, 2023 · 7 comments
Assignees
Labels
library Regarding `endsong` crate refactor

Comments

@fsktom
Copy link
Owner

fsktom commented Aug 6, 2023

Building upon #62

I've been testing some of this Rc cloning stuff and using Rc::strong_count and one thought occurred to me.

Currently, in parse an Rc<str> is built for every single artist, album, track name. Wouldn't it be more efficient if there was only a single Rc<str> for each unique artist, album, track name? I.e. instead of 7000x Rc<str> of "Sabaton", wouldn't 7000x references to a single Rc<str> of "Sabaton" be better?

You could use three HashSets (one for artist names, one for album names and one for track names). This way same album names from different artists (or track names) could be summed up under one Rc<str>
I wonder if this would improve performance or be too much overhead (with the HashSets) to bring benefits...

@fsktom fsktom added refactor library Regarding `endsong` crate labels Aug 6, 2023
@fsktom fsktom self-assigned this Aug 6, 2023
@fsktom
Copy link
Owner Author

fsktom commented Aug 6, 2023

Or maybe rather a HashMap where the key is the name String and the value is the Rc<str>

@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

Ughh I don't think I can use a HashMap because that would mean cloning the String anyways...

@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

rust-lang/rust#60896 (comment)
ughh yeah :((
I wanted to use .get_or_insert on the HashSet but it isn't implemented yet
seems like I really have to clone one more... :(

fsktom added a commit that referenced this issue Aug 7, 2023
@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

Base benchmarks (parse just does SongEntries::new(&paths[..=1]).unwrap() while songsdoes gather::songs(&entries, true))

parse                   time:   [34.568 ms 34.613 ms 34.661 ms]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

songs                   time:   [6.0376 ms 6.0842 ms 6.1486 ms]
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe

with the new stuff (confirmed that e.g.Rc<str> of Sabaton name has thousands of strong counts)
(didn't include the change becuase I ran quite a few benchmarks inbetween)

parse                   time:   [36.162 ms 36.254 ms 36.353 ms]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

songs                   time:   [5.9001 ms 5.9408 ms 5.9915 ms]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

I'm gonna test the gather stuff with 9 files

and I should probably check memory usage - this definitely should be far lower...

@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

With 10 endsong files
from

gather artists          time:   [7.3690 ms 7.3875 ms 7.4117 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

gather albums           time:   [10.495 ms 10.538 ms 10.589 ms]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

gather songs            time:   [26.400 ms 26.540 ms 26.688 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

to

gather artists          time:   [7.4624 ms 7.4743 ms 7.4866 ms]
                        change: [+0.8057% +1.1752% +1.4863%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

gather albums           time:   [10.162 ms 10.187 ms 10.213 ms]
                        change: [-3.8450% -3.3370% -2.8611%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

gather songs            time:   [25.405 ms 25.684 ms 26.049 ms]
                        change: [-4.3261% -3.2261% -1.7678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

not really significant

@fsktom
Copy link
Owner Author

fsktom commented Aug 7, 2023

ooops
I may have created three HashMaps for each endsong file instead of sharing them across :P
Because I noticed the Rc::strong_count of "Sabaton" was the same even if I added files xd

here is the one for three HashMaps for all endsong files

gather artists          time:   [7.2315 ms 7.2452 ms 7.2593 ms]
                        change: [-3.2817% -3.0653% -2.8152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

gather albums           time:   [9.8499 ms 9.8822 ms 9.9191 ms]
                        change: [-3.3740% -2.9891% -2.5371%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

gather songs            time:   [24.536 ms 24.593 ms 24.648 ms]
                        change: [-5.5949% -4.2454% -3.1616%] (p = 0.00 < 0.05)
                        Performance has improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library Regarding `endsong` crate refactor
Projects
None yet
Development

No branches or pull requests

1 participant