Optimised the `Decode::decode` for `[T; N]` #299

xgreenx · 2021-11-18T21:08:10Z

This change appears after this PR in ink!.

The initial decode is complex and takes a lot of space in a binary file. general_array_decode in this PR reduces the size of the function 2 times. But the native implementation for integers reduces it more(the same idea si used in encode_slice_no_len). It saves 518 bytes in the Erc20 example=)

polkadot address: 1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe

…ests.

Robbepop

Thanks a lot for this optimization!
Some benchmarks would be really nice and a comparison between before and with this PR.

Cargo.toml

src/codec.rs

Robbepop

LGTM

Though I'd still love to see some benchmark comparisons!

xgreenx · 2021-11-18T23:40:08Z

LGTM

Though I'd still love to see some benchmark comparisons!

Will be cool to have some benchmark tests in parity-scale-codec to check the size in WASM, but I think it is not part of this PR=D

I tried to build some heavy contracts in ink!:
delegator: - Original wasm size: 32.5K, Optimized: 10.0K(10043) -> Original wasm size: 30.8K, Optimized: 9.0K(9044) - saved 999 bytes
multisig: Original wasm size: 106.8K, Optimized: 47.6K(47597) -> Original wasm size: 106.0K, Optimized: 47.0K(47019) - saved 578 bytes
erc20: Original wasm size: 34.3K, Optimized: 10.6K(10632) -> Original wasm size: 33.5K, Optimized: 10.1K(10114) - saved 518 bytes
erc721: - Original wasm size: 85.2K, Optimized: 37.0K(36977) -> Original wasm size: 84.5K, Optimized: 36.5K(36491) - saved 486 bytes
erc1155: - Original wasm size: 88.8K, Optimized: 48.3K(48263) -> Original wasm size: 88.4K, Optimized: 48.2K(48203) - saved 60 bytes

If you want, I can create a PR in ink! repository to check the output of ink-waterfall=)

Robbepop · 2021-11-19T00:05:59Z

Will be cool to have some benchmark tests in parity-scale-codec to check the size in WASM, but I think it is not part of this PR=D

I mean cargo bench runtime benchmarks. ;)
The parity_scale_codec crate does not care too much about ink! smart contract sizes in general.
So by adding some runtime benchmarks we can make sure that the optimization implemented in this PR also improves runtime efficiency.

src/codec.rs

bkchr · 2021-11-19T14:20:27Z

src/codec.rs

+			let ref_typed: &[T; N] = unsafe { mem::transmute(&array) };
+			let typed: [T; N] = unsafe { ptr::read(ref_typed) };
+			Ok(typed)


Suggested change

let ref_typed: &[T; N] = unsafe { mem::transmute(&array) };

let typed: [T; N] = unsafe { ptr::read(ref_typed) };

Ok(typed)

Ok(unsafe { mem::transmute(array.into_inner()) })

Why can we not do this?

Because we can't convert due to this issue rust-lang/rust#61956

Ahh okay. Why don't we need to drop array? I assume this will then be done when dropping typed?

array and typed use the same memory. We return typed from the functions, so the compiler will not call drop on that variable. But it will drop array. So to avoid dropping of shared memory we don't need to drop array=)

Maybe you could add some comments explaining this to people looking at this code in the future.

Maybe you could add some comments explaining this to people looking at this code in the future.

Sounds like a good idea :)

bkchr · 2021-11-21T09:48:28Z

src/codec.rs

+	macro_rules! decode {
+		( u8 ) => {{
+			let mut array: ManuallyDrop<[u8; N]> = ManuallyDrop::new([0; N]);
+			input.read(&mut array[..])?;


This is a memory leak if read returns an error.

Good catch!=) @Robbepop I returned back the usage of forget

bkchr · 2021-11-21T09:48:34Z

src/codec.rs

+		( i8 ) => {{
+			let mut array: ManuallyDrop<[i8; N]> = ManuallyDrop::new([0; N]);
+			let bytes = unsafe { mem::transmute::<&mut [i8], &mut [u8]>(&mut array[..]) };
+			input.read(bytes)?;


…error

Robbepop · 2021-11-21T11:31:23Z

I'd like to see some benchmarks for this approach before we merge this.
Those benchmarks ideally show us a runtime improvement OR at least show that this space optimizations does not result in runtime performance decrease.

Some code I hacked together that should not be cargo-culted into this PR:

fn encode_decode_array<T: Codec, const N: usize>(c: &mut Criterion) {
	c.bench_function(
		&format!("array_encode_{}", type_name::<[T; N]>()),
		|bencher| {
		bencher.iter_batched_ref(
			|| [0x00; N],
			|array| {
				let _ = black_box(array.encode());
			},
			BatchSize::SmallInput,
		)
	});
	c.bench_function(
		&format!("array_decode_{}", type_name::<[T; N]>()),
		|bencher| {
		let input = vec![0x00; N * core::mem::size_of::<T>()];
		bencher.iter_batched_ref(
			|| &input[..],
			|input| {
				let _ = black_box(<[T; N]>::decode(input));
			},
			BatchSize::SmallInput,
		)
	});
}

criterion_group!{
	name = benches;
	config = Criterion::default().warm_up_time(Duration::from_millis(500)).without_plots();
	targets =
			encode_decode_array::<u8, 8192>,
			encode_decode_array::<u16, 8192>,
			encode_decode_array::<u32, 8192>,
			encode_decode_array::<u64, 8192>,
			encode_decode_array::<i8, 8192>,
			encode_decode_array::<i16, 8192>,
			encode_decode_array::<i32, 8192>,
			encode_decode_array::<i64, 8192>,
}
criterion_main!(benches);

bkchr · 2021-11-21T11:32:25Z

@xgreenx if you fix the merge conflicts, we can merge this.

# Conflicts: # tests/max_encoded_len_ui/crate_str.stderr # tests/max_encoded_len_ui/incomplete_attr.stderr # tests/max_encoded_len_ui/missing_crate_specifier.stderr # tests/max_encoded_len_ui/not_encode.stderr

xgreenx · 2021-11-21T13:34:19Z

I off all my heavy programs. After, I run the bench for an old version of Decode::decode(2 times to be sure that result is the same). After I did bench for a new version. The result below:

Gnuplot not found, using plotters backend
array_encode_[u8; 8192] time:   [2.0362 us 2.0728 us 2.1075 us]                                     
                        change: [-2.5540% +3.7079% +11.535%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

array_decode_[u8; 8192] time:   [4.9179 us 4.9317 us 4.9457 us]                                     
                        change: [-89.768% -89.711% -89.652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

array_encode_[u16; 8192]                                                                             
                        time:   [2.0247 us 2.0612 us 2.0911 us]
                        change: [-3.3047% +2.5467% +8.6157%] (p = 0.40 > 0.05)
                        No change in performance detected.

array_decode_[u16; 8192]                                                                             
                        time:   [6.3075 us 6.3515 us 6.4100 us]
                        change: [-90.291% -90.178% -90.071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

array_encode_[u32; 8192]                                                                             
                        time:   [2.0151 us 2.0496 us 2.0765 us]
                        change: [-6.2318% -0.2958% +6.1413%] (p = 0.92 > 0.05)
                        No change in performance detected.

array_decode_[u32; 8192]                                                                             
                        time:   [9.4854 us 9.5448 us 9.6006 us]
                        change: [-86.855% -86.700% -86.573%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) low severe
  13 (13.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

array_encode_[u64; 8192]                                                                             
                        time:   [2.0011 us 2.0363 us 2.0634 us]
                        change: [-7.4003% -1.2854% +4.8868%] (p = 0.70 > 0.05)
                        No change in performance detected.

array_decode_[u64; 8192]                                                                             
                        time:   [26.624 us 26.735 us 26.877 us]
                        change: [-61.303% -61.145% -60.962%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

array_encode_[u128; 8192]                                                                             
                        time:   [2.0079 us 2.0423 us 2.0688 us]
                        change: [-6.2668% -0.4866% +5.6703%] (p = 0.87 > 0.05)
                        No change in performance detected.

array_decode_[u128; 8192]                                                                            
                        time:   [55.277 us 55.359 us 55.448 us]
                        change: [-44.229% -43.987% -43.754%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

array_encode_[i8; 8192] time:   [1.9977 us 2.0339 us 2.0622 us]                                     
                        change: [-8.6504% -2.4910% +4.5759%] (p = 0.46 > 0.05)
                        No change in performance detected.

array_decode_[i8; 8192] time:   [4.9660 us 4.9858 us 5.0075 us]                                     
                        change: [-88.336% -88.153% -87.990%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

array_encode_[i16; 8192]                                                                             
                        time:   [1.9990 us 2.0354 us 2.0635 us]
                        change: [-6.6755% -0.7210% +5.7634%] (p = 0.82 > 0.05)
                        No change in performance detected.

array_decode_[i16; 8192]                                                                             
                        time:   [6.3097 us 6.4007 us 6.5550 us]
                        change: [-90.154% -90.079% -89.972%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

array_encode_[i32; 8192]                                                                             
                        time:   [2.0158 us 2.0499 us 2.0766 us]
                        change: [-5.8196% +0.3042% +7.0575%] (p = 0.93 > 0.05)
                        No change in performance detected.

array_decode_[i32; 8192]                                                                             
                        time:   [9.3346 us 9.4024 us 9.4697 us]
                        change: [-86.940% -86.847% -86.754%] (p = 0.00 < 0.05)
                        Performance has improved.

array_encode_[i64; 8192]                                                                             
                        time:   [2.0141 us 2.0470 us 2.0723 us]
                        change: [-7.0847% -1.0581% +5.5361%] (p = 0.74 > 0.05)
                        No change in performance detected.

array_decode_[i64; 8192]                                                                             
                        time:   [26.394 us 26.441 us 26.504 us]
                        change: [-62.515% -62.179% -61.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

array_encode_[i128; 8192]                                                                             
                        time:   [2.0118 us 2.0505 us 2.0828 us]
                        change: [-6.4327% -0.4668% +6.0002%] (p = 0.88 > 0.05)
                        No change in performance detected.

array_decode_[i128; 8192]                                                                            
                        time:   [55.349 us 55.422 us 55.492 us]
                        change: [-44.835% -44.371% -43.987%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

My system is ittle-endian=)

Robbepop · 2021-11-21T13:50:36Z

Great! Thanks for the benchmark results!
So this kinda resulted into a 5-10 x performance improvement for the optimized array decode cases on your machine.

My system is ittle-endian=)

And so is WebAssembly. ;)

bkchr · 2021-11-21T15:07:05Z

/tip medium

shawntabrizi · 2021-11-23T09:37:29Z

We will fix the tip bot to work in this repo, but in the meantime @xgreenx can you please update your first message in this PR to include:

polkadot address: <address>

So then we can send a tip to you when this PR is merged.

bkchr · 2021-11-23T13:39:43Z

/tip medium

substrate-tip-bot · 2021-11-23T13:39:44Z

Please fix the following problems before calling the tip bot again:

Contributor did not properly post their Polkadot or Kusama address. Make sure the pull request has: "{network} address: {address}".

shawntabrizi · 2021-11-23T16:57:26Z

/tip medium

substrate-tip-bot · 2021-11-23T16:57:38Z

A medium tip was successfully submitted for xgreenx (1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe on polkadot).

https://polkadot.js.org/apps/#/treasury/tips

Added more comments. Implemented traits for `[T; N]` arrays. It has many usage of `unsafe`, but I did the same change in this PR: paritytech/parity-scale-codec#299

xgreenx added 5 commits November 17, 2021 14:45

Simplified Decode for [T; N]

ada5037

Robepop version

b418b6e

Added specific implementation for u8, i8, and other integers. Added t…

a5764a5

…ests.

Removed Debug(added for testing)

c401cec

Add #[inline]

eb7ba47

xgreenx mentioned this pull request Nov 18, 2021

Optimized decode of AccountId use-ink/ink#1016

Closed

Robbepop reviewed Nov 18, 2021

View reviewed changes

Cargo.toml Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

Fixed tests with a new rust. Fixed comments in PR

5c094c0

Robbepop reviewed Nov 18, 2021

View reviewed changes

src/codec.rs Outdated Show resolved Hide resolved

src/codec.rs Outdated Show resolved Hide resolved

Fixed formatting

226be22

Robbepop approved these changes Nov 18, 2021

View reviewed changes

bkchr reviewed Nov 19, 2021

View reviewed changes

Added comment to describe complexity of transmute=)

4137711

bkchr reviewed Nov 21, 2021

View reviewed changes

Use forget instead of ManualDrop to avoid memory leak in case of …

6a3ebfc

…error

bkchr approved these changes Nov 21, 2021

View reviewed changes

Merge branch 'master' into optimization/not-complex-array

4a8e689

# Conflicts: # tests/max_encoded_len_ui/crate_str.stderr # tests/max_encoded_len_ui/incomplete_attr.stderr # tests/max_encoded_len_ui/missing_crate_specifier.stderr # tests/max_encoded_len_ui/not_encode.stderr

bkchr merged commit baa5863 into paritytech:master Nov 23, 2021

xgreenx mentioned this pull request Dec 8, 2021

OpenBrush follow-up Grant update Supercolony-net/Open-Grants-Program#8

Merged

0xMarkian mentioned this pull request Dec 8, 2021

OpenBrush follow-up Grant update w3f/Grants-Program#735

Merged

cmichi mentioned this pull request Feb 21, 2022

Look into why parity-scale-codec upgrade didn't minimize gas for all ink! examples use-ink/ink-docs#46

Closed

xgreenx mentioned this pull request Sep 25, 2022

Introduced two new traits for spec's serialization and deserialization FuelLabs/fuel-tx#190

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised the `Decode::decode` for `[T; N]` #299

Optimised the `Decode::decode` for `[T; N]` #299

xgreenx commented Nov 18, 2021 •

edited by shawntabrizi

Loading

Robbepop left a comment

Robbepop left a comment •

edited

Loading

xgreenx commented Nov 18, 2021 •

edited

Loading

Robbepop commented Nov 19, 2021 •

edited

Loading

bkchr Nov 19, 2021

xgreenx Nov 19, 2021

bkchr Nov 19, 2021

xgreenx Nov 20, 2021

Robbepop Nov 20, 2021

bkchr Nov 20, 2021

bkchr Nov 21, 2021

xgreenx Nov 21, 2021

bkchr Nov 21, 2021

Robbepop commented Nov 21, 2021

bkchr commented Nov 21, 2021

xgreenx commented Nov 21, 2021

Robbepop commented Nov 21, 2021 •

edited

Loading

bkchr commented Nov 21, 2021

shawntabrizi commented Nov 23, 2021

bkchr commented Nov 23, 2021

substrate-tip-bot bot commented Nov 23, 2021

shawntabrizi commented Nov 23, 2021

substrate-tip-bot bot commented Nov 23, 2021

Optimised the Decode::decode for [T; N] #299

Optimised the Decode::decode for [T; N] #299

Conversation

xgreenx commented Nov 18, 2021 • edited by shawntabrizi Loading

Robbepop left a comment

Choose a reason for hiding this comment

Robbepop left a comment • edited Loading

Choose a reason for hiding this comment

xgreenx commented Nov 18, 2021 • edited Loading

Robbepop commented Nov 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Robbepop commented Nov 21, 2021

bkchr commented Nov 21, 2021

xgreenx commented Nov 21, 2021

Robbepop commented Nov 21, 2021 • edited Loading

bkchr commented Nov 21, 2021

shawntabrizi commented Nov 23, 2021

bkchr commented Nov 23, 2021

substrate-tip-bot bot commented Nov 23, 2021

shawntabrizi commented Nov 23, 2021

substrate-tip-bot bot commented Nov 23, 2021

Optimised the `Decode::decode` for `[T; N]` #299

Optimised the `Decode::decode` for `[T; N]` #299

xgreenx commented Nov 18, 2021 •

edited by shawntabrizi

Loading

Robbepop left a comment •

edited

Loading

xgreenx commented Nov 18, 2021 •

edited

Loading

Robbepop commented Nov 19, 2021 •

edited

Loading

Robbepop commented Nov 21, 2021 •

edited

Loading