Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tpu-v2): fix tpu-v2 wait for payment spend and extract secret #2261

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

laruh
Copy link
Member

@laruh laruh commented Nov 1, 2024

  • In TakerCoinSwapOpsV2 trait wait_for_taker_payment_spend function was renamed to find_taker_payment_spend_tx (to avoid misunderstanding with wait_for_confirmation logic)

    In EthCoin function find_taker_payment_spend_tx_impl was provided for trait find_taker_payment_spend_tx method

  • Added new extract_secret_v2 method in TakerCoinSwapOpsV2 trait. implemented for UTXO and EthCoin.
    Did it as EthCoin legacy extract_secret doesnt fit TPU V2, so its better to require extract_secret_v2 for coins.

  • in lp_connect_start_bob and in lp_connected_alice provided fallback to legacy swap protocol.

    Legacy fallback happens when:
    user sets ctx.use_trading_proto_v2() as false, or other trading side uses legacy swap protocol, then even if user set ctx.use_trading_proto_v2() as true, they have to use legacy swap as well.
    Other reason of starting legacy swap, even if ctx.use_trading_proto_v2() is true, is the trading coin which doesnt support TPU V2

  • provide swap protocol version feat(utxo-swap): add utxo burn output for non-kmd taker fee #2112 (review)

  • handle require confirmations before changing taker/maker swap state as Completed
    Note: you can find the implementation above this line in taker_swap_v2.rs and in maker_swap_v2.rs

Self::change_state(Completed::new(), state_machine).await

@laruh laruh added the in progress Changes will be made from the author label Nov 1, 2024
@laruh laruh force-pushed the fix-tpu-v2-wait-for-payment-spend branch 2 times, most recently from a6052b3 to 6f0d1a6 Compare November 4, 2024 05:27
@laruh
Copy link
Member Author

laruh commented Nov 5, 2024

ps: will resolve conflicts or cherrypick commits in new up to date branch after eth-maker-tpu-v2 merge #2211

Base automatically changed from eth-maker-tpu-v2 to dev November 8, 2024 14:10
@laruh laruh force-pushed the fix-tpu-v2-wait-for-payment-spend branch from 35ed147 to c700b59 Compare November 8, 2024 14:59
@laruh laruh marked this pull request as ready for review November 8, 2024 15:00
@shamardy shamardy added P2 under review and removed in progress Changes will be made from the author labels Nov 11, 2024
Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, first iteration

}

async fn extract_secret_v2(&self, secret_hash: &[u8], spend_tx: &Self::Tx) -> Result<Vec<u8>, String> {
self.extract_secret_v2_impl(secret_hash, spend_tx).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not pass secret_hash along if the impl doesn't need it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks for catching, done 8d5ed46


// get all logged TakerPaymentSpent events from `from_block` till current block
let events = match self
.events_from_block(taker_swap_v2_contract, "TakerPaymentSpent", from_block, current_block)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we advance from_block if the event wasn't found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we advance from_block if the event wasn't found?

Shouldn't change blocks range, It introduces risks to skip necessary block.

Receiving empty events list does not necessarily indicate that there are no events, network latency can cause delays in the propagation and indexing of event logs even after a transaction is mined.
After a transaction is mined, the logs related to it need to be extracted and made available for querying. This process is not instantaneous.

Also we dont know all the nuances and differences of all blockchains. It is much safer to keep block range starting from swap start block.

Copy link
Member Author

@laruh laruh Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPD: l suggest to add if events.is_empty check to continue loop without moving forward
UPD2: added is empty check 8d5ed46

Comment on lines 526 to 528
info!("Tx {} not found yet", tx_hash);
Timer::sleep(check_every).await;
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have the tx_hash, we shouldn't keep looping right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yes

            if let Some(event) = found_event {
                if let Some(tx_hash) = event.transaction_hash {
                    let transaction = match self.transaction(TransactionId::Hash(tx_hash)).await {
                        Ok(Some(t)) => t,
                        Ok(None) => {
                            info!("Tx {} not found yet", tx_hash);
                            Timer::sleep(check_every).await;
                            continue;
                        },
                        Err(e) => {
                            error!("Get tx {} error: {}", tx_hash, e);
                            Timer::sleep(check_every).await;
                            continue;
                        },
                    };
                    let result = signed_tx_from_web3_tx(transaction).map_err(FindPaymentSpendError::Internal)?;
                    return Ok(result);
                }
            }

            Timer::sleep(5.).await;
        }

once we got Some(tx_hash) it means event we found had necessary swap id.
But if we got None or error in next step after Some(tx_hash), we dont have to repeat the whole process from the beginning.
Next loop cycle we can start right from requesting transaction from tx_hash. I will work on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 6c8b2c1

Copy link
Member

@borngraced borngraced Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should abort if we got error as this can lead to a potential infinite loop. self.transaction return error only if the tx data can't be deserialize which means bad tx data so we should break

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should abort if we got error as this can lead to a potential infinite loop.

There is no infinite loop, we have if now > wait_until check

        loop {
            let now = now_sec();
            if now > wait_until {
                return MmError::err(FindPaymentSpendError::Timeout { wait_until, now });
            }

self.transaction return error only if the tx data can't be deserialize which means bad tx data so we should break

no, it has try_rpc_send function, it could return web3::Error
image

Copy link
Member

@borngraced borngraced Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. then you should still handle the error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. then you should still handle the error

I think you confused transaction and signed_tx_from_web3_tx functions

// Proceed to check transaction if we have a tx_hash
if let Some(tx_hash) = tx_hash {
match self.transaction(TransactionId::Hash(tx_hash)).await {
Ok(Some(t)) => {
let transaction = signed_tx_from_web3_tx(t).map_err(FindPaymentSpendError::Internal)?;
return Ok(transaction);
},
Ok(None) => info!("spendTakerPayment transaction {} not found yet", tx_hash),
Err(e) => error!("Get spendTakerPayment transaction {} error: {}", tx_hash, e),
};
Timer::sleep(check_every).await;
continue;
}
}

transaction uses web3 calls.
signed_tx_from_web3_tx is one which doesnt have web3 requests and tries to build ethcore transaction from web3 transaction. If it fails then it is useless to repeat it, we can return error instantly and we do it

let transaction = signed_tx_from_web3_tx(t).map_err(FindPaymentSpendError::Internal)?;

let found_event = events.into_iter().find(|event| &event.data.0[..32] == id.as_slice());

if let Some(event) = found_event {
if let Some(tx_hash) = event.transaction_hash {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever be None? will this be a recoverable state then? otherwise we can terminate early.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever be None?

are you talking about event.transaction_hash? This type is from a dependency, we should handle it as it is. The transaction_hash could be None if the log is emitted by a transaction in a pending state. Once the transaction is included in a mined block, the value should be Some.

will this be a recoverable state then?

For TPU V2 we aim to have automatic recover process, if find_taker_payment_spend_tx return error then refund process will be started

let taker_payment_spend = match state_machine
.taker_coin
.find_taker_payment_spend_tx(
&self.taker_payment,
self.taker_coin_start_block,
state_machine.taker_payment_locktime(),
)
.await
{
Ok(tx) => tx,
Err(e) => {
let next_state = TakerPaymentRefundRequired {
taker_payment: self.taker_payment,
negotiation_data: self.negotiation_data,
reason: TakerPaymentRefundReason::MakerDidNotSpendInTime(format!("{}", e)),
};
return Self::change_state(next_state, state_machine).await;
},
};

otherwise we can terminate early

Could clarify what do you mean by termination? You want to return error and break loop?
We should try to find transaction in the loop until time is out

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction_hash could be None if the log is emitted by a transaction in a pending state. Once the transaction is included in a mined block, the value should be Some.

aren't we getting events till the current mined block? so this tx shouldn't be pending?

For TPU V2 we aim to have automatic recover process

I just meant we can not do nothing about the fact that tx hash is none.
nothing to do with swap recovery.

how I was thinking (which might be wrong) is that some event types don't have a tx hash which means we supplied a bad event id from the beginning meaning that we can't proceed further. this might not be the case though, gotta read more about this, excuse my eth illiteracy 😂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we getting events till the current mined block? so this tx shouldn't be pending?

yeah, you are right as we are using from and to block filters for eth_getLogs API, tx should be confirmed

    async fn events_from_block(
        &self,
        swap_contract_address: Address,
        event_name: &str,
        from_block: u64,
        to_block: u64,
    ) -> MmResult<Vec<Log>, FindPaymentSpendError> {
        let contract_event = TAKER_SWAP_V2.event(event_name)?;
        let filter = FilterBuilder::default()
            .topics(Some(vec![contract_event.signature()]), None, None, None)
            .from_block(BlockNumber::Number(from_block.into()))
            .to_block(BlockNumber::Number(to_block.into()))
            .address(vec![swap_contract_address])
            .build();
        let events_logs = self
            .logs(filter)
            .await
            .map_err(|e| FindPaymentSpendError::Transport(e.to_string()))?;
        Ok(events_logs)
    }

some event types don't have a tx hash

I think you are confused a bit. Events/logs themselves don't necessarily contain the transaction hash. Instead, the transaction hash is associated with the transaction that emitted the event. So Log type from web3 just contains info about tx which emitted this log.

Note about event and log words. event in Solidity is a way for a contract to emit logs during the execution of a function.
So they are close words, just events refer to the Solidity construct used in the smart contract to emit logs, while logs refer to the actual data that is recorded in the blockchain when events are emitted.

So using from to block range we are looking for Log which was emitted by spend taker payment transaction.

As for empty tx hash I would like to refer to previous comment #2261 (comment) empty event list or none transaction_hash are not 100% that there is no tx which we are looking for, it could be just blockchain indexation delays or other reasons.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will read further regarding the truncated (not necessarily empty? right?) event list issue, any resourced regarding the indexaction delay issues and such would be so helpful!

regarding an event not having a transaction_hash thought, how would we successfully get the event which has None for the transaction_hash and then try again and all of a sudden we get a Some transaction_hash! is that possible? are these event logs mutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will read further regarding the truncated (not necessarily empty? right?) event list issue, any resourced regarding the indexaction delay issues and such would be so helpful!

I would say that chain reorganization, network latency, node synchronization lag issues could cause missing information issues. These problems usually temporarily, as I understand. You should wait for more confirmed blocks also try to fetch the info from different nodes.

regarding an event not having a transaction_hash thought, how would we successfully get the event which has None for the transaction_hash and then try again and all of a sudden we get a Some transaction_hash! is that possible? are these event logs mutable?

Logs are not mutable. Also they are tied to transactions. When a transaction calls a smart contract function that emits an event, this event generates log, which is permanently recorded in the blockchain.

But there’s a nuance. Lest check doc. According to the documentation https://www.chainnodes.org/docs/ethereum/eth_getLogs, a log from a pending transaction might lack a transaction hash, but when the transaction is confirmed, the log should include it.

Therefore, ideally, when we request a list of logs using the events_from_block function with from_block and to_block filters, it should return only logs from confirmed blocks, which means confirmed transactions. In this case, event.transaction_hash should ideally always be Some.

if let Some(event) = found_event {
if let Some(tx_hash) = event.transaction_hash {

We dont need to change from_block. However, if Log has transaction_hash:None, it doesn't mean Log doesn't have transaction (actually its not correct by itself, as logs are not owners of txs), it means smth went wrong as eth_getLogs API with fromBlock and toBlock filters will use confirmed blocks.

Copy link
Member Author

@laruh laruh Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional info https://docs.alchemy.com/docs/deep-dive-into-eth_getlogs#what-are-logs-or-events

Logs and events are used synonymously—smart contracts generate logs by firing off events, so logs provide insights into events that occur within the smart contract. Logs can be found on transaction receipts.

Anytime a transaction is mined, we can see event logs for that transaction by making a request to eth_getLogs and then take actions based off those results. For example, if a purchase is being made using crypto payments, we can use eth_getLogs to see if the sender successfully made the payment before providing the item purchased.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if Log has transaction_hash:None, it doesn't mean Log doesn't have transaction, it means smth went wrong as eth_getLogs API with fromBlock and toBlock filters will use confirmed blocks.

im missing u between the lines here so let me repeat that to you and see if i got it correctly. transaction_hash MUST always be Some eventually, if it was None this is a temporary reporting/mining/etc... thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if Log has transaction_hash:None, it doesn't mean Log doesn't have transaction, it means smth went wrong as eth_getLogs API with fromBlock and toBlock filters will use confirmed blocks.

im missing u between the lines here so let me repeat that to you and see if i got it correctly. transaction_hash MUST always be Some eventually, if it was None this is a temporary reporting/mining/etc... thing.

Yes, in the context of not using "pending" tag in "eth_getLogs" API, we expect transaction_hash always be Some.
The best we can do is to repeat loop cycle until time is out, if None occurred.

Comment on lines 1175 to 1176
#[serde(skip_serializing_if = "Option::is_none")]
pub swap_version: Option<u32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about having this field a u32 defaulting to 1 if not present?
less Somes to unwrap

Copy link
Member Author

@laruh laruh Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to u32 type 707946a

serialisation should also be skipped if 1 value was provided, as its better not to risk sending p2p message with additional swap version 1 field to old seed nodes.

@@ -1399,6 +1410,13 @@ impl<'a> TakerOrderBuilder<'a> {
self
}

pub fn with_tpu_version(mut self, use_trading_proto_v2: bool) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function name feels like we gonna pass a version but then we pass a bool flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use_trading_proto_v2 8d5ed46

Later in a far far future if we have more protocol versions, then we can change name to with_tpu_version or set_tpu_version

taker_amount: &taker_amount,
locktime: &lock_time,
};
start_maker_legacy_swap(&ctx, maker_order, alice, secret, params).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we start legacy swap in this case and also in the fall back else case of the condition below for v2 swaps, we could merge these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done f817f5c

mm2src/mm2_main/src/lp_ordermatch.rs Outdated Show resolved Hide resolved
@onur-ozkan
Copy link
Member

Could you add some PR description to describe the work and help us on review?

@laruh
Copy link
Member Author

laruh commented Nov 12, 2024

Could you add some PR description to describe the work and help us on review?

Updated PR description

…nts.is_empty() check, rename with_tpu_version function.
@@ -1167,6 +1171,9 @@ pub struct TakerRequest {
#[serde(default)]
#[serde(skip_serializing_if = "Option::is_none")]
pub rel_protocol_info: Option<Vec<u8>>,
#[serde(default)]
#[serde(skip_serializing_if = "Option::is_none")]
pub swap_version: Option<u32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this swap_version is needed to differentiate between legacy and TPU swaps.
I'd suggest making it a enum (Legacy/TPU) instead of numeric.
I believe a numeric 'version' is what we can change regularly (like improving protocol between nodes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making it a enum (Legacy/TPU) instead of numeric.

TPU is already a numeric, we call ongoing update as Trading Protocol Upgrade V2 also we have swap v2 naming in repo.

I believe a numeric 'version' is what we can change regularly (like improving protocol between nodes).

I think it doesn't matter is it regular or rare, every swap protocol update which breaks backward compatibility and forces all seed nodes to be updated requires the version bump.

I suggest to keep versioning simple, where we treat legacy as v1 or none.
I think I will make field non optional and provide necessary functions for default and skip_serializing_if annotations, to not do None and Some(1) checks for legacy


where we treat legacy as v1

Also, its logically to think about legacy as v1 version as our tpu is starting from 2 not 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified swap_version field 707946a

also please join the discussion if you think you have smth to add
#2261 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think legacy and TPU are not versions in fact, but two different algorithms.
(I am concerned, because I added a version field in another PR which will be increased when we add/remove features to the swap protocol. I think it may be confusing to have several version fields so adding a scope to it, where it possible, would help to better understand code)

Copy link
Collaborator

@mariocynicys mariocynicys Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think legacy and TPU are not versions in fact, but two different algorithms.

Ummm, we can consider them as backward incompatible versions with each other. Tbh i am all for it for having just one version field that controls minor features and swap protocol used, this is simpler than using multiple versions.

edit: also what about an enum encabsulating the minor version/feature:

enum SwapVersion {
    #[default]
    Legacy(u32),
    TPU(u32),
    ...
}

Copy link
Collaborator

@dimxy dimxy Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its about messages versioning within some protocol, not protocol version itself? Correct me if I understood it wrong.

well, in daemons code they call it 'protocol version' and change each time whenever any, even slight-less, change in p2p messages occurs, this number just grows steadily.
If we want it to reflect that this is legacy or tpu we need to make it structured to provide space for numbering change (but this is maybe not needed as both message flows do not mix)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or what protocol are you providing inside legacy swap protocol?

No problem with msg, I named it 'protocol' like in daemons code.

Copy link
Collaborator

@dimxy dimxy Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be in the swap negotiation layer or better in the order matching layer

I implemented this for legacy at the negotiation level in #2112
I guess for order matching we would need this as well (if we may add new swap types or features)
But I think for order matching versoning will change more rarely

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for order matching we would need this as well (if we may add new swap types or features)
But I think for order matching versoning will change more rarely

I meant it would be better off having such negotiation params agreed upon before matching the order to the taker. This doesn't really shine in the burn case since we consider it as optional feature (if the other peer doesn't support it we will just proceed without burn), but if we wanna mark the feature/fix as required, then such a swap will fail after starting, in which case not starting it was the efficient choice to avoid locking funds etc...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for ordermatching, this pr introduces swap_version in Taker and Maker orders

Copy link
Member

@borngraced borngraced left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well-done! couple notes

mm2src/coins/eth/eth_swap_v2/eth_taker_swap_v2.rs Outdated Show resolved Hide resolved
mm2src/coins/eth/eth_swap_v2/eth_taker_swap_v2.rs Outdated Show resolved Hide resolved
Comment on lines +538 to +539
Timer::sleep(check_every).await;
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 lines just could be deleted btw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving them. If someone in the future adds logic after the if let Some(tx_hash) = tx_hash block, they might forget to add the continue again, or they might decide that not having the continue and sleep time is what we actually want. It's better to avoid confusion.image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think it is better to add a comment with a reminder about this instead of leaving redundant code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolving anyway

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving them. If someone in the future adds logic after the if let Some(tx_hash) = tx_hash block, they might forget to add the continue again, or they might decide that not having the continue and sleep time is what we actually want. It's better to avoid confusion.

no, whoever's adding the code should understand the how the current flow executes and how the new code will execute along.
leaving this would be confusing tbh, i would see this and question if it really does something important and think i misunderstand something (i would prolly double check what continue does in rust, that's the level of confusion i mean).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoever's adding the code should understand the how the current flow executes and how the new code will execute along.

should, but the human factor must be taken into account

if it really does something important

It does. If the transaction wasn't found by tx_hash, the next loop cycle should be started. Its better to use continue; explicitly in such places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove time sleep from the end of loop 2056f9b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants