Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add view tags to outputs to reduce wallet scanning time #8061

Merged
merged 1 commit into from
Apr 20, 2022

Conversation

j-berman
Copy link
Collaborator

@j-berman j-berman commented Nov 15, 2021

Overview

Implements view tags as proposed by @UkoeHB in in MRL issue monero-project/research-lab#73

At tx construction, the sender adds a 1-byte view tag to each output. The view tag is derived from the sender-receiver shared secret (view tag = first byte of hash(shared secret) . When scanning for outputs, the receiver can check the view tag for a match in order to reduce scanning time. When the view tag matches (expected match 1/256 outputs), the wallet proceeds to derive the output public key to check if the output belongs to the receiver. When the view tag does not match, the wallet avoids the more expensive EC operations when deriving the output public key.1 In tests on my machine, I saw reductions in scanning time upwards of 40% (and a minimum around 30%).

Update: switched to using cn_fast_hash to hash the shared secret (edit: see discussion below for more on this choice)
As suggested by @UkoeHB, the shared secret is hashed using SipHash 2-4 (the default), a keyed hash function that is specifically designed to protect a secret key and yield collisions, and is speedier than cn_fast_hash.

In order to maintain transaction uniformity, view tags are not supported until the fork height. After the fork height, all outputs must have a view tag. There is a grace period that allows both types at the fork height which allows the pool to clear without kicking tx's people create right at the fork height.

1 It's expected the skippable EC operations would be skipped for approximately 99.6% of outputs (expected false positive rate = 1/28 = 1/256 = 0.4%, therefore expected negative rate = 100% - 0.4% = 99.6%). This is why 1 byte is the optimal sweet spot for max performance gains without needing more bytes. Skipping just 0.4% more outputs does not yield noticeable gains.

Choices I made worth pointing out

For starters, I thought through various approaches deeply, and settled on what I feel is the sanest one. However, I'm absolutely prepared to refactor if necessary. I understand the various sections of the code that touch this fairly well enough at this point, would be happy to refactor if there is a stronger approach.

Added a new tx_out_to_tagged_key boost variant type, to replace tx_out_to_key at the next fork height

This was likely the most significant choice I made. It seemed to be the most sensible way to fit view tags into the code without introducing overly cumbersome structural changes across the code. I think the biggest issue with this approach is in JSON parsing. Example: on the chain today (with no view tags), the tx.vout target serializes to JSON as follows:

{ 
  "target": {
    "key": "ea3f..."
  }
}

In this PR, at the fork height when view tags are set to be supported, the tx.vout target starts serializing to JSON as follows:

{ 
  "target": {
    "tagged_key": {
      "key": "ea3f...",
      "view_tag": "a1"
    }
  }
}

This will be a breaking change for anyone who was relying on target.key, which I figure isn't ideal. I figure ideally it would instead serialize to the following:

{ 
  "target": {
    "key": "ea3f...",
    "view_tag": "a1"
  }
}

Since view tags aren't strictly necessary when reading for the key, it seemed ideal to maintain the prior JSON structure if possible, and not force people to change how they consume the JSON object downstream if they don't need to. But I couldn't get this ideal approach working in a clean way. Perhaps I'm missing something simple.

Generally very much so open to thoughts on this tx_out_to_tagged_key approach.

2 pre-rct tests aren't working

There are 2 pre-rct tests meant to test view tags on pre-rct outputs after the fork height that aren't working, because the tests aren't set up to support pre-rct outputs past v12, and I felt I spent enough time trying to get them to work. I tested it manually by mining pre-rct outputs, and calling sweep_unmixable after the fork height and scanning for them in a wallet making sure they used the view tags as expected, and they did. I figure priority-wise makes sense to move back over to binning/decoy selection work over getting these 2 minor tests working at this point. I added 15 working tests in total, not including those 2 pre-rct ones.

Still to-do

EDIT: clarified choice to use siphash + description of view tags

@UkoeHB
Copy link
Contributor

UkoeHB commented Nov 15, 2021

There are 2 pre-rct tests meant to test view tags on pre-rct outputs after the fork height that aren't working, because the tests aren't set up to support pre-rct outputs past v12, and I felt I spent enough time trying to get them to work.

Does this imply a new transaction format for all currently-active transaction types, which includes coinbase, the main RCTTypeCLSAG type, the pre-RCT-to-RCT transition type, and the denomination-to-pre-RCT transition type? How is versioning handled around this?

@j-berman
Copy link
Collaborator Author

j-berman commented Nov 15, 2021

There are 2 pre-rct tests meant to test view tags on pre-rct outputs after the fork height that aren't working, because the tests aren't set up to support pre-rct outputs past v12, and I felt I spent enough time trying to get them to work.

Does this imply a new transaction format for all currently-active transaction types, which includes coinbase, the main RCTTypeCLSAG type, the pre-RCT-to-RCT transition type, and the denomination-to-pre-RCT transition type? How is versioning handled around this?

Yes, all currently used types. All newly created outputs that could previously be of type tx_out_to_key, transition to type tx_out_to_tagged_key at the fork height. Unless I missed a spot in the code.

On versioning: at the fork height, the wallet (and miners) will start to construct outs of type tx_out_to_tagged_key instead of tx_out_to_key. At consensus, tx_out_to_key outputs are rejected at the fork height. When reading transaction data (and scanning for outputs) in the wallet, there is a helper function that first gets the boost variant type of the output, then gets the output public key.

In the code before this PR, when reading the output for a public key, there were these all over the code base: boost::get<txout_to_key>(tx.vout[n].target).key. I consolidated the boost getter into a helper function get_output_public_key that checks the boost variant type, then uses the correct getter to return the output public key.

EDIT: outputs don't have to be tx_out_to_tagged_key at the fork height, but tx_out_to_key types become invalid. (incorrect, ignore)

@sboulden
Copy link

Hi, early this year I (intentionally) sent myself a very substantial amount of Monero with a 1000000 block lock time. There is still ~3 years remaining on this transaction being unlocked.

I realize this locked_transaction feature is getting removed. I was ensured that my transaction would still be unaffected.
Can you confirm that this hardfork, which mandates a change to all transaction types, will not affect my transaction (which is a completed transaction, with thousands of confirmations, but is still pending unlock).

Thank you

@j-berman
Copy link
Collaborator Author

@sboulden your transaction will still be unaffected. This PR doesn't have any impact on locked outputs, or the locked_transfer feature. This PR would only impact newly created outputs at the fork height. I also sanity checked it locally just to be absolutely certain, your Monero will be just fine :)

The proposal to deprecate the locked_transfer feature is over here: monero-project/research-lab#78. No code has been written to move forward on that. And even if deprecation does move forward in this fork or in some future fork, to my knowledge it was never considered to render outputs un-spendable (or to mess with existing locked outputs at all). No one has proposed altering existing locked outputs in any way to my knowledge.

@sboulden
Copy link

I appreciate you confirming! Sometimes my mind gets worried that the impact on locked_transfers could be easily overlooked, so I just like to shout it out there.

Thanks again for your work on this pull, looking forward to it

src/crypto/crypto.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@UkoeHB UkoeHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the test vectors for siphash? They can be found in my seraphis_perf branch.

src/cryptonote_core/blockchain.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_tx_utils.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/cryptonote_tx_utils.cpp Outdated Show resolved Hide resolved
@@ -628,7 +634,8 @@ namespace cryptonote
}
}

bool r = construct_tx_with_tx_key(sender_account_keys, subaddresses, sources, destinations, change_addr, extra, tx, unlock_time, tx_key, additional_tx_keys, rct, rct_config, msout);
bool shuffle_outs = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because the default value can't be used with an explicit use_view_tags. These function signatures are disturbing, but that's not your fault.

src/wallet/wallet2.cpp Outdated Show resolved Hide resolved
@selsta
Copy link
Collaborator

selsta commented Nov 16, 2021

That's what I get with performance tests. Are these the 2 pre-rct tests you mentioned above?

test_out_can_be_to_acc<false, true> (1000 calls) - OK: 113 µs/call
test_out_can_be_to_acc<true, false> - FAILED
test_out_can_be_to_acc<true, true> - FAILED

@tevador
Copy link
Contributor

tevador commented Nov 16, 2021

I'd like to point out that there may be a reason to have view tags slightly larger than 8 bits (possibly as a future upgrade).

src/crypto/crypto.cpp Outdated Show resolved Hide resolved
src/cryptonote_basic/cryptonote_format_utils.cpp Outdated Show resolved Hide resolved
@j-berman
Copy link
Collaborator Author

That's what I get with performance tests. Are these the 2 pre-rct tests you mentioned above?

test_out_can_be_to_acc<false, true> (1000 calls) - OK: 113 µs/call
test_out_can_be_to_acc<true, false> - FAILED
test_out_can_be_to_acc<true, true> - FAILED

@selsta these failing perf tests should be working now :)

For reference, the pre-rct tests I was referencing in the description are:

  • gen_rct_tx_pre_rct_has_no_view_tag_from_hf_view_tags
  • gen_rct_tx_pre_rct_has_view_tag_from_hf_view_tags

They're commented out and don't run. Slapped a TODO on them

@jtgrassie
Copy link
Contributor

the shared secret is hashed using SipHash

I'm interested to know why you are not using cn_fast_hash, like practically everything else in the code does? I'd expected to see something like t=H(salt|D)[0] (with H being cn_fast_hash) in your derive_view_tag.

@UkoeHB
Copy link
Contributor

UkoeHB commented Nov 16, 2021

I'm interested to know why you are not using cn_fast_hash, like practically everything else in the code does?

In testing for view tags, I found using siphash led to about 1.5% faster scanning than cn_fast_hash.

@jtgrassie
Copy link
Contributor

In testing for view tags, I found using siphash led to about 1.5% faster scanning than cn_fast_hash.

Sure, but given the overall scan time improvement, is introducing another hash function implementation worth this extra 1.5% speed? The other implication that comes to mind of using siphash here is that key security is being reduced to 2^128. cn_fast_hash just seemed the obvious candidate given we're already using it everywhere else.

@UkoeHB
Copy link
Contributor

UkoeHB commented Nov 16, 2021

Sure, but given the overall scan time improvement, is introducing another hash function implementation worth this extra 1.5% speed?

This PR is likely to be permanent and final, so yes it is worth it to get as much speed as possible while we still can (setting aside assembly-level optimizations).

The other implication that comes to mind of using siphash here is that key security is being reduced to 2^128. cn_fast_hash just seemed the obvious candidate given we're already using it everywhere else.

Ed25519 security is 2^128, there is no fundamental reduction here. Also note that view tags are only 1 byte, so no more than 8 bits can be leaked even in the worst case. (unless @tevador's idea is adopted)

@jtgrassie
Copy link
Contributor

This PR is likely to be permanent and final

? I wouldn't be so quick to use absolute statements like this. There's very little in this project that remains "permanent and final".

Ed25519 security is 2^128

Ed25519 security is at least 2^128. siphash is at most 2^128. Whilst I accept your point that only 1 byte can ever be leaked, that 1 byte is enough to fingerprint, and it's this hashing of the shared secret that is preventing the fingerprint.

@tevador
Copy link
Contributor

tevador commented Nov 16, 2021

Using siphash to calculate the view tags perfectly matches the intended use case of the hash function: hashtables. View tags are essentially transforming the list of outputs into many hashtables with the hash keys being shared between the sender and the receiver.

key security is being reduced to 2^128

Any hash (even keccak) truncated to 8 bits will have a security of at most 8 bits against preimages and 4 bits against collisions. But even if the whole output was used, siphash doesn't leak any information about he secret key. Even a reduced siphash-1-2 variant requires at least 2^98 operations to recover the secret key.

Copy link
Contributor

@UkoeHB UkoeHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

memwipe(siphash_key, 16);

// only need a slice of view_tag_full to realize optimal perf/space efficiency
static_assert(sizeof(crypto::view_tag) <= sizeof(view_tag_full));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message parameter is only optional since c++17 and the codebase required minimum is c++14.

@vtnerd
Copy link
Contributor

vtnerd commented Nov 19, 2021

I'm with @jtgrassie on the siphash discussion. IIRC, the intended use case was/is hashtables, not cryptographic usage. Truncating a sha3 hash is safer, and a 1.5% penalty is low.

@vtnerd
Copy link
Contributor

vtnerd commented Nov 19, 2021

@tevador to follow up on that argument - cryptographic hash functions have a requirement against inversion (pre-image resistance). This is a (lesser?) requirement for siphash - the preimage resistance might be ~2^64. This is critical because it relates to privacy.

I don't see a good argument for siphash, unless there is some evidence that pre-image resistance is higher.

@UkoeHB
Copy link
Contributor

UkoeHB commented Nov 19, 2021

@tevador to follow up on that argument - cryptographic hash functions have a requirement against inversion (pre-image resistance). This is a (lesser?) requirement for siphash - the preimage resistance might be ~2^64. This is critical because it relates to privacy.

I don't see a good argument for siphash, unless there is some evidence that pre-image resistance is higher.

Pre-image resistance is irrelevant here, because the hash message is public knowledge. Siphash is a keyed hash function, and the key in our case is 16 bytes of the sender-receiver derivation.

}
else if (hf_version < HF_VERSION_VIEW_TAGS)
{
// require outputs to be of type txout_to_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check that all historical tx outputs are of type txout_to_key? You can check by syncing the node running this code from scratch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep + I synced from scratch running this code. Going to try it again to be safe.

@SChernykh
Copy link
Contributor

SChernykh commented Mar 29, 2022

@j-berman please read https://www.reddit.com/r/Monero/comments/jdh5to/psa_a_bug_has_caused_some_nodes_to_get_stuck_on/g98nuwl/?utm_source=reddit&utm_medium=web2x&context=3
I couldn't find where you limit mempool transactions to view tags only starting from HF_VERSION_VIEW_TAGS. check_output_types still allows old transactions in the mempool during the grace period and we can end up with the same fiasco we had with CLSAG/MLSAG hardfork.

@j-berman
Copy link
Collaborator Author

@SChernykh

#7169 in combination with this PR's code ejects txout_to_key tx's from the pool once hf_version > HF_VERSION_VIEW_TAGS (via m_tx_pool.validate -> add_tx (after take_tx) -> check_tx_outputs -> check_output_types). During the grace period when hf_version == HF_VERSION_VIEW_TAGS, both output types are explicitly allowed into the pool and can be mined. It's not until hf_version > HF_VERSION_VIEW_TAGS that the txout_to_tagged_key becomes the only allowed output type.

@SChernykh
Copy link
Contributor

Nice, so this issue was fixed properly shortly after the CLSAG fork. That means you don't have to address it in this PR at all.

@j-berman
Copy link
Collaborator Author

j-berman commented Apr 4, 2022

Rebased to master. Resolved conflicts in:

tests/crypto/main.cpp
tests/crypto/tests.txt

Implements view tags as proposed by @UkoeHB in MRL issue
monero-project/research-lab#73

At tx construction, the sender adds a 1-byte view tag to each
output. The view tag is derived from the sender-receiver
shared secret. When scanning for outputs, the receiver can
check the view tag for a match, in order to reduce scanning
time. When the view tag does not match, the wallet avoids the
more expensive EC operations when deriving the output public
key using the shared secret.
@j-berman
Copy link
Collaborator Author

@tevador I'd like to include you in the bounty payout for providing useful input on this PR. I've been able to contact all other reviewers; not sure how best to reach you.

@j-berman j-berman mentioned this pull request Jun 29, 2022
vtnerd pushed a commit to vtnerd/monero-lws that referenced this pull request Jul 25, 2022
- monero-project/monero#8061
- also update import location for epee::misc_utils::get_gmt_time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.