-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised the Decode::decode
for [T; N]
#299
Optimised the Decode::decode
for [T; N]
#299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this optimization!
Some benchmarks would be really nice and a comparison between before and with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Though I'd still love to see some benchmark comparisons!
Will be cool to have some benchmark tests in I tried to build some heavy contracts in ink!: If you want, I can create a PR in ink! repository to check the output of |
I mean |
src/codec.rs
Outdated
let ref_typed: &[T; N] = unsafe { mem::transmute(&array) }; | ||
let typed: [T; N] = unsafe { ptr::read(ref_typed) }; | ||
Ok(typed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let ref_typed: &[T; N] = unsafe { mem::transmute(&array) }; | |
let typed: [T; N] = unsafe { ptr::read(ref_typed) }; | |
Ok(typed) | |
Ok(unsafe { mem::transmute(array.into_inner()) }) |
Why can we not do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can't convert due to this issue rust-lang/rust#61956
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay. Why don't we need to drop array
? I assume this will then be done when dropping typed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array
and typed
use the same memory. We return typed
from the functions, so the compiler will not call drop
on that variable. But it will drop array
. So to avoid dropping of shared memory we don't need to drop array
=)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could add some comments explaining this to people looking at this code in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could add some comments explaining this to people looking at this code in the future.
Sounds like a good idea :)
macro_rules! decode { | ||
( u8 ) => {{ | ||
let mut array: ManuallyDrop<[u8; N]> = ManuallyDrop::new([0; N]); | ||
input.read(&mut array[..])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a memory leak if read
returns an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!=) @Robbepop I returned back the usage of forget
( i8 ) => {{ | ||
let mut array: ManuallyDrop<[i8; N]> = ManuallyDrop::new([0; N]); | ||
let bytes = unsafe { mem::transmute::<&mut [i8], &mut [u8]>(&mut array[..]) }; | ||
input.read(bytes)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
I'd like to see some benchmarks for this approach before we merge this. Some code I hacked together that should not be cargo-culted into this PR: fn encode_decode_array<T: Codec, const N: usize>(c: &mut Criterion) {
c.bench_function(
&format!("array_encode_{}", type_name::<[T; N]>()),
|bencher| {
bencher.iter_batched_ref(
|| [0x00; N],
|array| {
let _ = black_box(array.encode());
},
BatchSize::SmallInput,
)
});
c.bench_function(
&format!("array_decode_{}", type_name::<[T; N]>()),
|bencher| {
let input = vec![0x00; N * core::mem::size_of::<T>()];
bencher.iter_batched_ref(
|| &input[..],
|input| {
let _ = black_box(<[T; N]>::decode(input));
},
BatchSize::SmallInput,
)
});
}
criterion_group!{
name = benches;
config = Criterion::default().warm_up_time(Duration::from_millis(500)).without_plots();
targets =
encode_decode_array::<u8, 8192>,
encode_decode_array::<u16, 8192>,
encode_decode_array::<u32, 8192>,
encode_decode_array::<u64, 8192>,
encode_decode_array::<i8, 8192>,
encode_decode_array::<i16, 8192>,
encode_decode_array::<i32, 8192>,
encode_decode_array::<i64, 8192>,
}
criterion_main!(benches); |
@xgreenx if you fix the merge conflicts, we can merge this. |
# Conflicts: # tests/max_encoded_len_ui/crate_str.stderr # tests/max_encoded_len_ui/incomplete_attr.stderr # tests/max_encoded_len_ui/missing_crate_specifier.stderr # tests/max_encoded_len_ui/not_encode.stderr
I off all my heavy programs. After, I run the bench for an old version of Gnuplot not found, using plotters backend
array_encode_[u8; 8192] time: [2.0362 us 2.0728 us 2.1075 us]
change: [-2.5540% +3.7079% +11.535%] (p = 0.29 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
array_decode_[u8; 8192] time: [4.9179 us 4.9317 us 4.9457 us]
change: [-89.768% -89.711% -89.652%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
array_encode_[u16; 8192]
time: [2.0247 us 2.0612 us 2.0911 us]
change: [-3.3047% +2.5467% +8.6157%] (p = 0.40 > 0.05)
No change in performance detected.
array_decode_[u16; 8192]
time: [6.3075 us 6.3515 us 6.4100 us]
change: [-90.291% -90.178% -90.071%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
array_encode_[u32; 8192]
time: [2.0151 us 2.0496 us 2.0765 us]
change: [-6.2318% -0.2958% +6.1413%] (p = 0.92 > 0.05)
No change in performance detected.
array_decode_[u32; 8192]
time: [9.4854 us 9.5448 us 9.6006 us]
change: [-86.855% -86.700% -86.573%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) low severe
13 (13.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
array_encode_[u64; 8192]
time: [2.0011 us 2.0363 us 2.0634 us]
change: [-7.4003% -1.2854% +4.8868%] (p = 0.70 > 0.05)
No change in performance detected.
array_decode_[u64; 8192]
time: [26.624 us 26.735 us 26.877 us]
change: [-61.303% -61.145% -60.962%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
array_encode_[u128; 8192]
time: [2.0079 us 2.0423 us 2.0688 us]
change: [-6.2668% -0.4866% +5.6703%] (p = 0.87 > 0.05)
No change in performance detected.
array_decode_[u128; 8192]
time: [55.277 us 55.359 us 55.448 us]
change: [-44.229% -43.987% -43.754%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
array_encode_[i8; 8192] time: [1.9977 us 2.0339 us 2.0622 us]
change: [-8.6504% -2.4910% +4.5759%] (p = 0.46 > 0.05)
No change in performance detected.
array_decode_[i8; 8192] time: [4.9660 us 4.9858 us 5.0075 us]
change: [-88.336% -88.153% -87.990%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
array_encode_[i16; 8192]
time: [1.9990 us 2.0354 us 2.0635 us]
change: [-6.6755% -0.7210% +5.7634%] (p = 0.82 > 0.05)
No change in performance detected.
array_decode_[i16; 8192]
time: [6.3097 us 6.4007 us 6.5550 us]
change: [-90.154% -90.079% -89.972%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
array_encode_[i32; 8192]
time: [2.0158 us 2.0499 us 2.0766 us]
change: [-5.8196% +0.3042% +7.0575%] (p = 0.93 > 0.05)
No change in performance detected.
array_decode_[i32; 8192]
time: [9.3346 us 9.4024 us 9.4697 us]
change: [-86.940% -86.847% -86.754%] (p = 0.00 < 0.05)
Performance has improved.
array_encode_[i64; 8192]
time: [2.0141 us 2.0470 us 2.0723 us]
change: [-7.0847% -1.0581% +5.5361%] (p = 0.74 > 0.05)
No change in performance detected.
array_decode_[i64; 8192]
time: [26.394 us 26.441 us 26.504 us]
change: [-62.515% -62.179% -61.891%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
array_encode_[i128; 8192]
time: [2.0118 us 2.0505 us 2.0828 us]
change: [-6.4327% -0.4668% +6.0002%] (p = 0.88 > 0.05)
No change in performance detected.
array_decode_[i128; 8192]
time: [55.349 us 55.422 us 55.492 us]
change: [-44.835% -44.371% -43.987%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe My system is ittle-endian=) |
Great! Thanks for the benchmark results!
And so is WebAssembly. ;) |
/tip medium |
We will fix the tip bot to work in this repo, but in the meantime @xgreenx can you please update your first message in this PR to include:
So then we can send a tip to you when this PR is merged. |
/tip medium |
Please fix the following problems before calling the tip bot again:
|
/tip medium |
A medium tip was successfully submitted for xgreenx (1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe on polkadot). |
Added more comments. Implemented traits for `[T; N]` arrays. It has many usage of `unsafe`, but I did the same change in this PR: paritytech/parity-scale-codec#299
This change appears after this PR in ink!.
The initial
decode
is complex and takes a lot of space in a binary file.general_array_decode
in this PR reduces the size of the function 2 times. But the native implementation for integers reduces it more(the same idea si used inencode_slice_no_len
). It saves518
bytes in theErc20
example=)polkadot address: 1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe