Refactor/encode decode via lut #40

mattiZed · 2024-04-29T08:47:27Z

Okay, here we go. This should hopefully finally fix #39 and #37 as well as #35.

I have taken some inspiration from flexpolyline, I think the performance improvement on the encoder side comes mainly from computing the scaled longitude/latidude only once for each new coordinate pair - compare the loop in encode_coordinates(). When we subtract previous from current, previous is already scaled.

EDIT: Seeing the performance gains made in #42 I'd rather attribute the improved encoder performance to removing the string concatenation and directly pusing to the result buffer.

I have changed the code to use the LUT implementation when it was faster on my local machine (M2 Pro). I choose to leave the respective alternative in the code. I encourage you to test on your system as well, maybe you report differently.

The main improvement regarding #39 and #37 comes from explicitly computing and comparing against the maximum shift that would still provide a valid longitude/latitude coordinate.

Timings compared to latest main:

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [10.713 µs 10.744 µs 10.777 µs]
                        change: [-83.109% -83.025% -82.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

bench decode: 21502 coordinates
                        time:   [277.77 µs 280.55 µs 283.09 µs]
                        change: [+51.144% +52.512% +53.845%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench polyline6 decoding
                        time:   [54.661 ns 54.814 ns 54.986 ns]
                        change: [+35.941% +36.468% +36.938%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

bench HUGE polyline6 decoding
                        time:   [58.936 µs 59.531 µs 60.308 µs]
                        change: [+96.872% +99.934% +103.05%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

I don't quite know what to say about decoder performance. The conditionals sure are responsible for the performance hit, but it's still rather quick and not as abysmal as other solutions - I assume we'd have to sacrifice a bit of performance for robustness here?

refactor: decode uses LUT, improve delta computation

feat: provide naive and LUT-based algorithms for both encode and decode

michaelkirk · 2024-05-01T15:54:38Z

This is really interesting and I think worth pursuing.

My understanding is that you've done at least 2 big things to make things faster:

avoided allocating a bunch of strings (like I redundantly did in speed up encode #42)
implemented a lookup table

My #42 does only the first part, and isn't quite as fast. But it's still a big boost: -70%-75%, and seemingly without any of the related slowdowns, so it feels not at all controversial.

How would you feel about starting by merging #42, or you (@mattiZed) could pull the string allocation part out of this PR into a separate PR, I'd be happy to have your name on the commits if you'd prefer that - especially since you wrote it first. 🙃

My motivation is that, it might be illuminating to evaluate what exactly we're getting from the LUT without muddying the waters, since seemingly most of the speedup is from avoiding all the string allocations.

mattiZed · 2024-05-01T17:10:12Z

I agree 100%. i will remove string allocation improvement from this pr so it can be merged first. then merging your pr after should be straight forward. will do it tomorrow.

michaelkirk · 2024-05-01T19:57:21Z

I was thinking let's merge the string allocation improvement first, since it's strictly an improvement.

mattiZed · 2024-05-02T05:10:40Z

Either is fine for me, will update this PR now.

… traceability of performance gains

mattiZed · 2024-05-02T05:24:31Z

And here are the new timings:

Most of the encoding gain is/was from pushing to the same buffer all around, and will be retrieved again from #42.

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [55.520 µs 55.782 µs 56.073 µs]
                        change: [-12.412% -12.012% -11.609%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

bench decode: 21502 coordinates
                        time:   [280.85 µs 282.70 µs 284.43 µs]
                        change: [+56.166% +57.259% +58.311%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

bench polyline6 decoding
                        time:   [53.460 ns 53.707 ns 53.999 ns]
                        change: [+34.154% +34.893% +35.661%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

bench HUGE polyline6 decoding
                        time:   [65.463 µs 66.191 µs 67.074 µs]
                        change: [+135.64% +138.94% +142.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

michaelkirk · 2024-05-06T23:30:36Z

Would you mind rebasing and running against the new benchmarks now that #42 is merged? I think/hope they'll be a little more stable.

mattiZed · 2024-05-07T16:24:41Z

Hi Mike, im on vacation for this and next week and will continue work on this after. However, you could also take over if you'd like/it's urgent in any sense.

Best,
Stefan

mattiZed and others added 4 commits April 28, 2024 20:08

refactor: encode uses LUT, also work on string buffer reference directly

5578f18

refactor: decode uses LUT, improve delta computation

fix: differentiate character and overflow errors

10ef755

feat: provide naive and LUT-based algorithms for both encode and decode

chore: update changelog

66c8af5

style: also use binary literal in decode_lut

0948108

michaelkirk mentioned this pull request May 1, 2024

speed up encode #42

Merged

revert: remove string allocation/concatenation improvement for better…

e11dd90

… traceability of performance gains

This was referenced May 7, 2024

encode performance tweaks #43

Merged

Validate output of decoding #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/encode decode via lut #40

Refactor/encode decode via lut #40

mattiZed commented Apr 29, 2024 •

edited

Loading

michaelkirk commented May 1, 2024

mattiZed commented May 1, 2024

michaelkirk commented May 1, 2024

mattiZed commented May 2, 2024

mattiZed commented May 2, 2024

michaelkirk commented May 6, 2024

mattiZed commented May 7, 2024

Refactor/encode decode via lut #40

Are you sure you want to change the base?

Refactor/encode decode via lut #40

Conversation

mattiZed commented Apr 29, 2024 • edited Loading

michaelkirk commented May 1, 2024

mattiZed commented May 1, 2024

michaelkirk commented May 1, 2024

mattiZed commented May 2, 2024

mattiZed commented May 2, 2024

michaelkirk commented May 6, 2024

mattiZed commented May 7, 2024

mattiZed commented Apr 29, 2024 •

edited

Loading