Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include lowercase name fields in Artist,Album,Song #79

Closed
fsktom opened this issue Aug 21, 2024 · 9 comments
Closed

Include lowercase name fields in Artist,Album,Song #79

fsktom opened this issue Aug 21, 2024 · 9 comments
Assignees
Labels
cli Regarding `endsong_ui` crate library Regarding `endsong` crate refactor

Comments

@fsktom
Copy link
Owner

fsktom commented Aug 21, 2024

Noticed it doing #33 that I'm calling .to_lowercase() a lot (bc of summing across albums and ignoring capitalization). Maybe it'd be nice if each SongEntry and song/album/artist contained an Rc<str> with its name but in lowercase. Would take up more space but I think it could speed up the computations.

@fsktom fsktom added refactor library Regarding `endsong` crate cli Regarding `endsong_ui` crate labels Aug 21, 2024
@fsktom fsktom self-assigned this Aug 21, 2024
@fsktom
Copy link
Owner Author

fsktom commented Aug 21, 2024

Some benchmarks @ a4f0bf2 with only endsong_0.json before doing anything

parsing:

parse, sum and filter   time:   [24.457 ms 24.565 ms 24.719 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

gathering songs summed across albums:

gather songs summed across albums
                        time:   [16.655 ms 16.965 ms 17.356 ms]
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe

@fsktom
Copy link
Owner Author

fsktom commented Aug 21, 2024

hmm
parsing (and even summing different capitalizations after making it use the intrinsic _lowercase fields) is 25% slower

parse, sum and filter   time:   [30.462 ms 30.833 ms 31.556 ms]
                        change: [+23.504% +25.515% +29.583%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

@fsktom
Copy link
Owner Author

fsktom commented Aug 21, 2024

huh

gather songs summed across albums
                        time:   [16.378 ms 16.578 ms 16.952 ms]
                        change: [-5.0224% -2.2867% +0.8282%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe

didn't really make a difference
XD

@fsktom
Copy link
Owner Author

fsktom commented Aug 21, 2024

Hmm maybe I should think about using a crate called unicase for no allocation string case-insensitive comparisons?

https://www.reddit.com/r/rust/comments/6wbru2/performance_issue_can_i_avoid_of_using_the_slow/dm6u0as/
https://crates.io/crates/unicase

But how would I do sum_different_capitalization or gather::songs_summed_across_albums with it?

@fsktom
Copy link
Owner Author

fsktom commented Aug 22, 2024

I've done quite a bit of rewriting, benchmarking and profiling. Using UniCase in sum_different_capitalization doesn't make a difference really compared to .to_lowercase()

@fsktom
Copy link
Owner Author

fsktom commented Aug 22, 2024

hey but at least find is 10x faster now :)

find v1                 time:   [1.0716 ms 1.0754 ms 1.0802 ms]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

find v2                 time:   [11.730 ms 11.774 ms 11.831 ms]
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  11 (11.00%) high severe

@fsktom
Copy link
Owner Author

fsktom commented Aug 22, 2024

XDDD find song ignore album went from 7ms to 200μs XD

find song               time:   [6.9057 ms 7.0559 ms 7.2482 ms]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

to

find song               time:   [207.80 µs 209.54 µs 211.58 µs]
                        change: [-97.110% -97.023% -96.946%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
 pub fn song(entries: &[SongEntry], song_name: &str, artist_name: &str) -> Option<Vec<Song>> {
-    let (song_name, artist_name) = (song_name.to_lowercase(), artist_name.to_lowercase());
+    let song_bogus_album = Song::new(song_name, "", artist_name);

     let song_versions = entries
         .iter()
-        .filter(|entry| {
-            entry.track.to_lowercase() == song_name && entry.artist.to_lowercase() == artist_name
-        })
+        .filter(|entry| song_bogus_album.is_entry_lowercase_ignore_album(entry))
         .map(Song::from)
         .unique()
         .collect_vec();

and

pub fn is_entry_lowercase_ignore_album(&self, entry: &SongEntry) -> bool {
    self.album.artist.is_entry(entry) && UniCase::new(&entry.track) == UniCase::new(&self.name)
}

@fsktom
Copy link
Owner Author

fsktom commented Aug 22, 2024

So all in all
after MANY hours of trying things out and testing
decided not to include new fields
but instead use UniCase for case-insensitive comparisons

@fsktom
Copy link
Owner Author

fsktom commented Aug 25, 2024

gather songs summed across albums
time: [16.378 ms 16.578 ms 16.952 ms]
change: [-5.0224% -2.2867% +0.8282%] (p = 0.14 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) low mild
4 (4.00%) high mild
8 (8.00%) high severe

improved performance of this in fefc4f9
from

gather songs summed across albums
                        time:   [16.684 ms 17.729 ms 19.667 ms]
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

in b764ae0

to

gather songs summed across albums
                        time:   [1.8376 ms 1.8443 ms 1.8536 ms]
                        change: [-90.573% -89.523% -88.853%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

because I'm doing it far smarter now and even fixed a bug heh...

fsktom referenced this issue Aug 25, 2024
Improve performance bc no longer stupid
Thanks GitHub Copilot Chat for .then_with
Also fixed bug with these functions where
if you didn't sum_different_capitalization
it wasn't deterministic
If there were multiple capitalizations of a song
in the same album
But now it will always pick the correct version
with the most plays!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Regarding `endsong_ui` crate library Regarding `endsong` crate refactor
Projects
None yet
Development

No branches or pull requests

1 participant