Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimised the Decode::decode for [T; N] #299

Merged

Conversation

xgreenx
Copy link
Contributor

@xgreenx xgreenx commented Nov 18, 2021

This change appears after this PR in ink!.

The initial decode is complex and takes a lot of space in a binary file. general_array_decode in this PR reduces the size of the function 2 times. But the native implementation for integers reduces it more(the same idea si used in encode_slice_no_len). It saves 518 bytes in the Erc20 example=)

polkadot address: 1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe

Copy link
Contributor

@Robbepop Robbepop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this optimization!
Some benchmarks would be really nice and a comparison between before and with this PR.

Cargo.toml Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@Robbepop Robbepop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Though I'd still love to see some benchmark comparisons!

@xgreenx
Copy link
Contributor Author

xgreenx commented Nov 18, 2021

LGTM

Though I'd still love to see some benchmark comparisons!

Will be cool to have some benchmark tests in parity-scale-codec to check the size in WASM, but I think it is not part of this PR=D

I tried to build some heavy contracts in ink!:
delegator: - Original wasm size: 32.5K, Optimized: 10.0K(10043) -> Original wasm size: 30.8K, Optimized: 9.0K(9044) - saved 999 bytes
multisig: Original wasm size: 106.8K, Optimized: 47.6K(47597) -> Original wasm size: 106.0K, Optimized: 47.0K(47019) - saved 578 bytes
erc20: Original wasm size: 34.3K, Optimized: 10.6K(10632) -> Original wasm size: 33.5K, Optimized: 10.1K(10114) - saved 518 bytes
erc721: - Original wasm size: 85.2K, Optimized: 37.0K(36977) -> Original wasm size: 84.5K, Optimized: 36.5K(36491) - saved 486 bytes
erc1155: - Original wasm size: 88.8K, Optimized: 48.3K(48263) -> Original wasm size: 88.4K, Optimized: 48.2K(48203) - saved 60 bytes

If you want, I can create a PR in ink! repository to check the output of ink-waterfall=)

@Robbepop
Copy link
Contributor

Robbepop commented Nov 19, 2021

Will be cool to have some benchmark tests in parity-scale-codec to check the size in WASM, but I think it is not part of this PR=D

I mean cargo bench runtime benchmarks. ;)
The parity_scale_codec crate does not care too much about ink! smart contract sizes in general.
So by adding some runtime benchmarks we can make sure that the optimization implemented in this PR also improves runtime efficiency.

src/codec.rs Outdated Show resolved Hide resolved
src/codec.rs Outdated
Comment on lines 684 to 686
let ref_typed: &[T; N] = unsafe { mem::transmute(&array) };
let typed: [T; N] = unsafe { ptr::read(ref_typed) };
Ok(typed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ref_typed: &[T; N] = unsafe { mem::transmute(&array) };
let typed: [T; N] = unsafe { ptr::read(ref_typed) };
Ok(typed)
Ok(unsafe { mem::transmute(array.into_inner()) })

Why can we not do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't convert due to this issue rust-lang/rust#61956

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay. Why don't we need to drop array? I assume this will then be done when dropping typed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array and typed use the same memory. We return typed from the functions, so the compiler will not call drop on that variable. But it will drop array. So to avoid dropping of shared memory we don't need to drop array=)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add some comments explaining this to people looking at this code in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add some comments explaining this to people looking at this code in the future.

Sounds like a good idea :)

macro_rules! decode {
( u8 ) => {{
let mut array: ManuallyDrop<[u8; N]> = ManuallyDrop::new([0; N]);
input.read(&mut array[..])?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a memory leak if read returns an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!=) @Robbepop I returned back the usage of forget

( i8 ) => {{
let mut array: ManuallyDrop<[i8; N]> = ManuallyDrop::new([0; N]);
let bytes = unsafe { mem::transmute::<&mut [i8], &mut [u8]>(&mut array[..]) };
input.read(bytes)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@Robbepop
Copy link
Contributor

I'd like to see some benchmarks for this approach before we merge this.
Those benchmarks ideally show us a runtime improvement OR at least show that this space optimizations does not result in runtime performance decrease.

Some code I hacked together that should not be cargo-culted into this PR:

fn encode_decode_array<T: Codec, const N: usize>(c: &mut Criterion) {
	c.bench_function(
		&format!("array_encode_{}", type_name::<[T; N]>()),
		|bencher| {
		bencher.iter_batched_ref(
			|| [0x00; N],
			|array| {
				let _ = black_box(array.encode());
			},
			BatchSize::SmallInput,
		)
	});
	c.bench_function(
		&format!("array_decode_{}", type_name::<[T; N]>()),
		|bencher| {
		let input = vec![0x00; N * core::mem::size_of::<T>()];
		bencher.iter_batched_ref(
			|| &input[..],
			|input| {
				let _ = black_box(<[T; N]>::decode(input));
			},
			BatchSize::SmallInput,
		)
	});
}

criterion_group!{
	name = benches;
	config = Criterion::default().warm_up_time(Duration::from_millis(500)).without_plots();
	targets =
			encode_decode_array::<u8, 8192>,
			encode_decode_array::<u16, 8192>,
			encode_decode_array::<u32, 8192>,
			encode_decode_array::<u64, 8192>,
			encode_decode_array::<i8, 8192>,
			encode_decode_array::<i16, 8192>,
			encode_decode_array::<i32, 8192>,
			encode_decode_array::<i64, 8192>,
}
criterion_main!(benches);

@bkchr
Copy link
Member

bkchr commented Nov 21, 2021

@xgreenx if you fix the merge conflicts, we can merge this.

# Conflicts:
#	tests/max_encoded_len_ui/crate_str.stderr
#	tests/max_encoded_len_ui/incomplete_attr.stderr
#	tests/max_encoded_len_ui/missing_crate_specifier.stderr
#	tests/max_encoded_len_ui/not_encode.stderr
@xgreenx
Copy link
Contributor Author

xgreenx commented Nov 21, 2021

I off all my heavy programs. After, I run the bench for an old version of Decode::decode(2 times to be sure that result is the same). After I did bench for a new version. The result below:

Gnuplot not found, using plotters backend
array_encode_[u8; 8192] time:   [2.0362 us 2.0728 us 2.1075 us]                                     
                        change: [-2.5540% +3.7079% +11.535%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

array_decode_[u8; 8192] time:   [4.9179 us 4.9317 us 4.9457 us]                                     
                        change: [-89.768% -89.711% -89.652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

array_encode_[u16; 8192]                                                                             
                        time:   [2.0247 us 2.0612 us 2.0911 us]
                        change: [-3.3047% +2.5467% +8.6157%] (p = 0.40 > 0.05)
                        No change in performance detected.

array_decode_[u16; 8192]                                                                             
                        time:   [6.3075 us 6.3515 us 6.4100 us]
                        change: [-90.291% -90.178% -90.071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

array_encode_[u32; 8192]                                                                             
                        time:   [2.0151 us 2.0496 us 2.0765 us]
                        change: [-6.2318% -0.2958% +6.1413%] (p = 0.92 > 0.05)
                        No change in performance detected.

array_decode_[u32; 8192]                                                                             
                        time:   [9.4854 us 9.5448 us 9.6006 us]
                        change: [-86.855% -86.700% -86.573%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) low severe
  13 (13.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

array_encode_[u64; 8192]                                                                             
                        time:   [2.0011 us 2.0363 us 2.0634 us]
                        change: [-7.4003% -1.2854% +4.8868%] (p = 0.70 > 0.05)
                        No change in performance detected.

array_decode_[u64; 8192]                                                                             
                        time:   [26.624 us 26.735 us 26.877 us]
                        change: [-61.303% -61.145% -60.962%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

array_encode_[u128; 8192]                                                                             
                        time:   [2.0079 us 2.0423 us 2.0688 us]
                        change: [-6.2668% -0.4866% +5.6703%] (p = 0.87 > 0.05)
                        No change in performance detected.

array_decode_[u128; 8192]                                                                            
                        time:   [55.277 us 55.359 us 55.448 us]
                        change: [-44.229% -43.987% -43.754%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

array_encode_[i8; 8192] time:   [1.9977 us 2.0339 us 2.0622 us]                                     
                        change: [-8.6504% -2.4910% +4.5759%] (p = 0.46 > 0.05)
                        No change in performance detected.

array_decode_[i8; 8192] time:   [4.9660 us 4.9858 us 5.0075 us]                                     
                        change: [-88.336% -88.153% -87.990%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

array_encode_[i16; 8192]                                                                             
                        time:   [1.9990 us 2.0354 us 2.0635 us]
                        change: [-6.6755% -0.7210% +5.7634%] (p = 0.82 > 0.05)
                        No change in performance detected.

array_decode_[i16; 8192]                                                                             
                        time:   [6.3097 us 6.4007 us 6.5550 us]
                        change: [-90.154% -90.079% -89.972%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

array_encode_[i32; 8192]                                                                             
                        time:   [2.0158 us 2.0499 us 2.0766 us]
                        change: [-5.8196% +0.3042% +7.0575%] (p = 0.93 > 0.05)
                        No change in performance detected.

array_decode_[i32; 8192]                                                                             
                        time:   [9.3346 us 9.4024 us 9.4697 us]
                        change: [-86.940% -86.847% -86.754%] (p = 0.00 < 0.05)
                        Performance has improved.

array_encode_[i64; 8192]                                                                             
                        time:   [2.0141 us 2.0470 us 2.0723 us]
                        change: [-7.0847% -1.0581% +5.5361%] (p = 0.74 > 0.05)
                        No change in performance detected.

array_decode_[i64; 8192]                                                                             
                        time:   [26.394 us 26.441 us 26.504 us]
                        change: [-62.515% -62.179% -61.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

array_encode_[i128; 8192]                                                                             
                        time:   [2.0118 us 2.0505 us 2.0828 us]
                        change: [-6.4327% -0.4668% +6.0002%] (p = 0.88 > 0.05)
                        No change in performance detected.

array_decode_[i128; 8192]                                                                            
                        time:   [55.349 us 55.422 us 55.492 us]
                        change: [-44.835% -44.371% -43.987%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

My system is ittle-endian=)

@Robbepop
Copy link
Contributor

Robbepop commented Nov 21, 2021

Great! Thanks for the benchmark results!
So this kinda resulted into a 5-10 x performance improvement for the optimized array decode cases on your machine.

My system is ittle-endian=)

And so is WebAssembly. ;)

@bkchr
Copy link
Member

bkchr commented Nov 21, 2021

/tip medium

@shawntabrizi
Copy link
Member

We will fix the tip bot to work in this repo, but in the meantime @xgreenx can you please update your first message in this PR to include:

polkadot address: <address>

So then we can send a tip to you when this PR is merged.

@bkchr
Copy link
Member

bkchr commented Nov 23, 2021

/tip medium

@substrate-tip-bot
Copy link

Please fix the following problems before calling the tip bot again:

  • Contributor did not properly post their Polkadot or Kusama address. Make sure the pull request has: "{network} address: {address}".

@shawntabrizi
Copy link
Member

/tip medium

@substrate-tip-bot
Copy link

A medium tip was successfully submitted for xgreenx (1nNaTpU9GHFvF7ZrSMu2CudQjXftR8Aqx58oMDgcuoH8dKe on polkadot).

https://polkadot.js.org/apps/#/treasury/tips

@bkchr bkchr merged commit baa5863 into paritytech:master Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants