Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: showcase recursion case for transfer #7271

Closed
wants to merge 5 commits into from
Closed

Conversation

LHerskind
Copy link
Contributor

@LHerskind LHerskind commented Jul 2, 2024

Fixes #7142 and #7362.

Instead of doing direct recursion on the transfer I have altered the balances_map slightly to include a new sub function that allow a reduction of notes WITHOUT inserting the change note. We can then recurse over this new sub function to spend notes until we get what we need and then insert a change note after that.

Notice that the transfer is consuming at most 2 notes, but that the _accumulate might consume up to 8!

The reason for this is fairly simple, the transfer would be inserting notes that are instantly consumed but spend plenty of juice on constraining the encryption first. Instead we can just spend a lot of notes with _accumulate without creating any new notes, and then return a final change value for the transfer to add for the from.

Since there are less "overhead" and we are just consuming notes in _accumulate we can spend the same number of constraints just consuming notes as we usually spend on consuming, encrypting and updating state.


During the testing figured something interesting, that might be an issue with compiler optimisation.

// Doing this 
let call = Token::at(context.this_address())._accumulate(from, missing.to_field());
if !missing.eq(U128::zero()) {
    change = call.call(&mut context);
}

// Is usually MUCH (many thousand constraints) cheaper than doing this
if !missing.eq(U128::zero()) {
    change = Token::at(context.this_address())._accumulate(from, missing.to_field()).call(&mut context);
}

  • Also removes the check in the note_getter that asserts the length of the bounded vec is non-zero. Since that allow us for more meaningful error messages.
  • Adds support for foreign call errors such that we catch nested calls failing.

Copy link
Contributor Author

LHerskind commented Jul 2, 2024

@AztecBot
Copy link
Collaborator

AztecBot commented Jul 2, 2024

Benchmark results

Metrics with a significant change:

  • app_circuit_witness_generation_time_in_ms (Token:transfer): 1,126 (-31%)
  • avm_simulation_time_ms (Token:mint_public): 47.4 (-39%)
  • app_circuit_proving_time_in_ms (SchnorrAccount:entrypoint): 6,384 (+34%)
  • app_circuit_proving_time_in_ms (Token:transfer): 6,141 (-47%)
  • app_circuit_size_in_gates (Token:transfer): 262,144 (-50%)
  • protocol_circuit_witness_generation_time_in_ms (private-kernel-inner): 844 (+20%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Proof generation

Each column represents the number of threads used in proof generation.

Metric 1 threads 4 threads 16 threads 32 threads 64 threads
proof_construction_time_sha256_ms 5,719 (-1%) 1,550 (-1%) 708 762 774
proof_construction_time_sha256_30_ms 11,933 (+1%) 3,151 (-1%) 1,408 1,427 (+1%) 1,470
proof_construction_time_sha256_100_ms 43,889 11,794 5,459 (+1%) 5,429 5,359
proof_construction_time_poseidon_hash_ms 78.0 (-1%) 34.0 34.0 58.0 88.0 (+1%)
proof_construction_time_poseidon_hash_30_ms 1,517 (-1%) 416 201 (+1%) 231 (+1%) 268 (+1%)
proof_construction_time_poseidon_hash_100_ms 5,736 (-1%) 1,563 (-1%) 718 776 (+2%) 788 (-1%)

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 4 txs 8 txs 16 txs
l1_rollup_calldata_size_in_bytes 1,412 1,412 1,412
l1_rollup_calldata_gas 9,476 9,466 9,476
l1_rollup_execution_gas 611,215 611,356 611,517
l2_block_processing_time_in_ms 749 (-1%) 1,405 2,714 (-1%)
l2_block_building_time_in_ms 18,979 37,707 (+1%) 74,011
l2_block_rollup_simulation_time_in_ms 18,979 37,707 (+1%) 74,011
l2_block_public_tx_process_time_in_ms 16,191 34,750 (+1%) 71,067

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 8 txs.

Metric 3 blocks 5 blocks
node_history_sync_time_in_ms 7,003 9,849
node_database_size_in_bytes 12,423,248 16,195,664
pxe_database_size_in_bytes 16,254 26,813

Circuits stats

Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks.

Circuit simulation_time_in_ms witness_generation_time_in_ms proving_time_in_ms input_size_in_bytes output_size_in_bytes proof_size_in_bytes num_public_inputs size_in_gates
private-kernel-init 110 (+1%) 399 (-3%) 13,607 (-5%) 19,990 55,022 74,432 2,259 524,288
private-kernel-inner 221 ⚠️ 844 (+20%) 30,955 (+2%) 82,090 55,022 74,432 2,259 1,048,576
private-kernel-tail 1,058 2,502 (-1%) 54,392 (+1%) 62,345 62,057 14,912 399 2,097,152
base-parity 6.16 1,319 (+1%) 2,639 128 64.0 2,208 2.00 131,072
root-parity 48.6 69.2 (+3%) 38,839 (-2%) 27,100 64.0 2,720 18.0 2,097,152
base-rollup 5,951 (+1%) 4,781 (-2%) 92,021 170,330 728 3,648 47.0 4,194,304
root-rollup 110 81.8 (+1%) 23,055 (-4%) 25,253 620 3,456 41.0 1,048,576
public-kernel-setup 433 (-1%) 2,399 42,257 102,121 80,278 106,912 3,274 2,097,152
public-kernel-app-logic 397 (+1%) 3,386 (+1%) 41,388 (-6%) 102,121 80,278 106,912 3,274 2,097,152
public-kernel-tail 1,136 26,861 (-1%) 180,281 399,014 10,014 14,912 399 8,388,608
private-kernel-reset-small 291 1,159 (-13%) 30,106 (+14%) 79,209 55,022 74,432 2,259 1,048,576
public-kernel-teardown 379 (-1%) 3,386 (+1%) 43,395 (-1%) 102,121 80,278 106,912 3,274 2,097,152
merge-rollup 29.1 (-1%) N/A N/A 16,486 728 N/A N/A N/A
private-kernel-tail-to-public N/A 8,533 58,809 (-3%) N/A N/A 106,912 3,274 2,097,152

Stats on running time collected for app circuits

Function input_size_in_bytes output_size_in_bytes witness_generation_time_in_ms proof_size_in_bytes proving_time_in_ms size_in_gates num_public_inputs
ContractClassRegisterer:register 1,344 9,364 396 (+1%) N/A N/A N/A N/A
ContractInstanceDeployer:deploy 1,408 9,364 24.6 N/A N/A N/A N/A
MultiCallEntrypoint:entrypoint 1,920 9,364 648 (-1%) N/A N/A N/A N/A
GasToken:deploy 1,376 9,364 549 N/A N/A N/A N/A
SchnorrAccount:constructor 1,312 9,364 478 (-2%) N/A N/A N/A N/A
SchnorrAccount:entrypoint 2,304 9,364 790 (-4%) 16,000 ⚠️ 6,384 (+34%) 131,072 433
Token:privately_mint_private_note 1,280 9,364 530 (-1%) N/A N/A N/A N/A
FPC:fee_entrypoint_public 1,344 9,364 94.0 (+1%) 16,000 1,750 (+5%) 65,536 433
Token:transfer 1,312 9,364 ⚠️ 1,126 (-31%) 16,000 ⚠️ 6,141 (-47%) ⚠️ 262,144 (-50%) 433
AuthRegistry:set_authorized (avm) 19,226 N/A N/A 91,264 1,353 (-3%) N/A N/A
FPC:prepare_fee (avm) 26,668 N/A N/A 91,328 2,811 (-3%) N/A N/A
Token:transfer_public (avm) 42,918 N/A N/A 91,328 3,895 N/A N/A
AuthRegistry:consume (avm) 33,104 N/A N/A 91,264 2,777 (-5%) N/A N/A
FPC:pay_refund (avm) 36,833 N/A N/A 91,296 23,537 (+1%) N/A N/A
Benchmarking:create_note 1,344 9,364 464 N/A N/A N/A N/A
SchnorrAccount:verify_private_authwit 1,280 9,364 41.3 N/A N/A N/A N/A
Token:unshield 1,376 9,364 1,311 N/A N/A N/A N/A
FPC:fee_entrypoint_private 1,376 9,364 1,705 (-6%) N/A N/A N/A N/A

AVM Simulation

Time to simulate various public functions in the AVM.

Function time_ms bytecode_size_in_bytes
GasToken:_increase_public_balance 67.5 (-8%) 13,790
GasToken:set_portal 14.0 (-6%) 3,339
Token:constructor 93.3 (+2%) 23,692
FPC:constructor 61.5 13,592
GasToken:mint_public 48.7 (-3%) 10,158
Token:mint_public ⚠️ 47.4 (-39%) 19,034
Token:assert_minter_and_mint 130 (-12%) 12,925
AuthRegistry:set_authorized 27.4 (-1%) 7,812
FPC:prepare_fee 105 (-3%) 15,062
Token:transfer_public 35.5 (+23%) 31,218
FPC:pay_refund 135 (+5%) 25,260
Benchmarking:increment_balance 1,985 (+1%) 15,267
Token:_increase_public_balance 14.9 15,006
FPC:pay_refund_with_shielded_rebate 154 (+2%) 26,347

Public DB Access

Time to access various public DBs.

Function time_ms
get-nullifier-index 0.154

Tree insertion stats

The duration to insert a fixed batch of leaves into each tree type.

Metric 1 leaves 16 leaves 64 leaves 128 leaves 256 leaves 512 leaves 1024 leaves
batch_insert_into_append_only_tree_16_depth_ms 10.4 16.7 N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_count 16.8 31.7 N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_ms 0.601 0.513 N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_32_depth_ms N/A N/A 48.0 75.4 130 (-2%) 244 469 (-1%)
batch_insert_into_append_only_tree_32_depth_hash_count N/A N/A 95.9 159 287 543 1,055
batch_insert_into_append_only_tree_32_depth_hash_ms N/A N/A 0.491 0.464 0.448 (-1%) 0.442 0.438 (-1%)
batch_insert_into_indexed_tree_20_depth_ms N/A N/A 59.4 111 (-1%) 181 (-2%) 352 691 (-1%)
batch_insert_into_indexed_tree_20_depth_hash_count N/A N/A 109 207 355 691 1,363
batch_insert_into_indexed_tree_20_depth_hash_ms N/A N/A 0.502 0.498 0.481 (-1%) 0.477 0.474 (-1%)
batch_insert_into_indexed_tree_40_depth_ms N/A N/A 72.5 N/A N/A N/A N/A
batch_insert_into_indexed_tree_40_depth_hash_count N/A N/A 133 N/A N/A N/A N/A
batch_insert_into_indexed_tree_40_depth_hash_ms N/A N/A 0.516 N/A N/A N/A N/A

Miscellaneous

Transaction sizes based on how many contract classes are registered in the tx.

Metric 0 registered classes 1 registered classes
tx_size_in_bytes 74,057 667,841

Transaction size based on fee payment method

| Metric | |
| - | |

subtrahend: U128,
limit: u32
) -> (U128, U128) where T: NoteInterface<T_SERIALIZED_LEN, T_SERIALIZED_BYTES_LEN> + OwnedNote {
// docs:start:get_notes
Copy link
Contributor

@rahul-kothari rahul-kothari Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dangling doc comments here and below btw

@LHerskind LHerskind changed the base branch from master to lh/private-call-stack-item-hash July 2, 2024 16:46
@LHerskind LHerskind force-pushed the lh/private-call-stack-item-hash branch 2 times, most recently from 752252f to da132ed Compare July 3, 2024 19:33
@LHerskind LHerskind force-pushed the lh/private-call-stack-item-hash branch from da132ed to 9ed5785 Compare July 3, 2024 22:44
@LHerskind LHerskind changed the base branch from lh/private-call-stack-item-hash to lh/public-call-stack-item-hash July 3, 2024 22:45
@LHerskind LHerskind force-pushed the lh/public-call-stack-item-hash branch from 411f896 to b790ce8 Compare July 4, 2024 14:39
@LHerskind LHerskind force-pushed the lh/public-call-stack-item-hash branch from 02f62f0 to a77b618 Compare July 5, 2024 09:39
Base automatically changed from lh/public-call-stack-item-hash to master July 5, 2024 10:24
@LHerskind LHerskind marked this pull request as ready for review July 5, 2024 15:39
Copy link
Contributor

@Thunkar Thunkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but still a few notes:

  • Can you please add a TXE test closer to the e2e?
  • Now that we get meaningful error messages on nested calls, can we have a task to change the current negative tests to have meaningful error messages?
  • Don't hate me but...odd_sub? 🤣

Copy link
Contributor

@benesjan benesjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -113,6 +113,39 @@ impl<T> BalancesMap<T, &mut PrivateContext> {

self.add(owner, minuend - subtrahend)
}

pub fn odd_sub<T_SERIALIZED_LEN, T_SERIALIZED_BYTES_LEN>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn odd_sub<T_SERIALIZED_LEN, T_SERIALIZED_BYTES_LEN>(
/// Performs a subtraction operation on the owner's balance, allowing for partial subtraction.
///
/// This function attempts to subtract a given amount (subtrahend) from the owner's balance,
/// but can handle cases where the balance is insufficient. It returns either the remaining
/// balance after subtraction or the deficit amount.
///
/// # Arguments
/// * `self` - The current instance of the contract.
/// * `owner` - The AztecAddress of the account owner.
/// * `subtrahend` - The amount to be subtracted.
/// * `limit` - The maximum number of notes to consider when calculating the balance.
///
/// # Returns
/// A tuple `(U128, U128)` where:
/// - The first element is the remaining balance if subtraction was successful, or zero if not.
/// - The second element is zero if subtraction was successful, or the deficit amount if not.
///
/// # Type Parameters
/// * `T_SERIALIZED_LEN` - The serialized length of the note.
/// * `T_SERIALIZED_BYTES_LEN` - The serialized bytes length of the note.
/// * `T` - The type implementing NoteInterface and OwnedNote traits.
///
/// # Panics
/// Panics if no notes are found for the given owner (i.e., if the balance is zero).
pub fn odd_sub<T_SERIALIZED_LEN, T_SERIALIZED_BYTES_LEN>(

Claude ai generated nice docs here.

BTW agree with Grego that odd_sub is not that great of a name. Maybe sub_with_deficit would be better?

}

if minuend >= subtrahend {
(minuend - subtrahend, U128::zero())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(minuend - subtrahend, U128::zero())
// If balance is sufficient, return remaining balance and zero deficit
(minuend - subtrahend, U128::zero())

if minuend >= subtrahend {
(minuend - subtrahend, U128::zero())
} else {
(U128::zero(), subtrahend - minuend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(U128::zero(), subtrahend - minuend)
// If balance is insufficient, return zero remaining and the deficit amount
(U128::zero(), subtrahend - minuend)

@@ -335,13 +336,38 @@ contract Token {
let to_ivpk = header.get_ivpk_m(&mut context, to);

let amount = U128::from_integer(amount);
storage.balances.sub(from, amount).emit(encode_and_encrypt_note_with_keys_unconstrained(&mut context, from_ovpk, from_ivpk));
let mut (change, missing) = storage.balances.odd_sub(from, amount, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain here why we try to fetch only 2 notes? It would be very non-obvious to outsiders reading the code unless they are very knowledgeable about circuit optimization and we want the repo contracts to be an example.

change = call.call(&mut context);
}

storage.balances.add(from, change).emit(encode_and_encrypt_note_with_keys_unconstrained(&mut context, from_ovpk, from_ivpk));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage.balances.add(from, change).emit(encode_and_encrypt_note_with_keys_unconstrained(&mut context, from_ovpk, from_ivpk));
// We don't constrain encryption of the note log in `transfer` (unlike in `transfer_from`) because transfer
// function is only designed to be used in situations where it's not necessary (e.g. payment to another person
// where the payment is considered to be successful when the other party successfully decrypts a note).
storage.balances.add(from, change).emit(encode_and_encrypt_note_with_keys_unconstrained(&mut context, from_ovpk, from_ivpk));

@nventuro nventuro self-assigned this Jul 19, 2024
@nventuro
Copy link
Contributor

I'm splitting this into a couple smaller PRs since there's quite a few things going on at the same time. We''ll still do this, but I want to keep the unrelated changes separate if at all possible. Created #7621 for zero note retrieval (which by itself has a bunch of repercussions).

@nventuro nventuro added the S-do-not-merge Status: Do not merge this PR label Jul 25, 2024
@nventuro
Copy link
Contributor

nventuro commented Aug 1, 2024

Superceded by #7730

@nventuro nventuro closed this Aug 1, 2024
@nventuro nventuro deleted the lh/fiddle branch August 1, 2024 19:34
nventuro added a commit that referenced this pull request Aug 6, 2024
Replaces #7271
Closes #7142
Closes #7362

This is quite similar to the implementation in #7271: `transfer`
consumes two notes, and if their amount is insufficient for the desired
transfer it calls a second internal function which recursively consumes
8 notes per iteration until either the amount is reached, or no more
notes are found. If the total amount consumed exceeds the transfer
amount, a change note is created.

This results in a much smaller transfer function for the scenario in
which two notes suffice, since we're using a `limit` value of 2 and
therefore only doing two iterations of note constraining (the expensive
part). Two notes is a good amount as it provides a way for change notes
to be consumed in follow-up calls.

The recursive call has a much lower gate count, since it doesn't need to
fetch keys, create notes, emit events, etc., and so we can afford to
read more notes here while still resulting in a relatively small
circuit.

I created a new e2e test in which the number of notes and their amounts
are quite controlled in order to properly test this. The tests are
unfortunately currently interdependent, but given the relative
simplicity of the test suite it should be hopefully simple to maintain
and expand, and maybe eventually fix.
AztecBot pushed a commit to AztecProtocol/aztec-nr that referenced this pull request Aug 7, 2024
Replaces AztecProtocol/aztec-packages#7271
Closes AztecProtocol/aztec-packages#7142
Closes AztecProtocol/aztec-packages#7362

This is quite similar to the implementation in #7271: `transfer`
consumes two notes, and if their amount is insufficient for the desired
transfer it calls a second internal function which recursively consumes
8 notes per iteration until either the amount is reached, or no more
notes are found. If the total amount consumed exceeds the transfer
amount, a change note is created.

This results in a much smaller transfer function for the scenario in
which two notes suffice, since we're using a `limit` value of 2 and
therefore only doing two iterations of note constraining (the expensive
part). Two notes is a good amount as it provides a way for change notes
to be consumed in follow-up calls.

The recursive call has a much lower gate count, since it doesn't need to
fetch keys, create notes, emit events, etc., and so we can afford to
read more notes here while still resulting in a relatively small
circuit.

I created a new e2e test in which the number of notes and their amounts
are quite controlled in order to properly test this. The tests are
unfortunately currently interdependent, but given the relative
simplicity of the test suite it should be hopefully simple to maintain
and expand, and maybe eventually fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-do-not-merge Status: Do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider refactoring Token's transfer function with recursion
6 participants