Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Tx Monitor #3404

Merged
merged 20 commits into from
Dec 17, 2021
Merged

Local Tx Monitor #3404

merged 20 commits into from
Dec 17, 2021

Conversation

KtorZ
Copy link
Contributor

@KtorZ KtorZ commented Oct 2, 2021

  • 📍 Resurrect LocalTxMonitor protocol, reworking protocol's types
    I've added an extra step to acquire a particular mempool snapshot to better map the MemPool API from the consensus. This is a similar pattern to the Local State Query protocol, although it only acquires the latest mempool snapshot. Also added a query to check for a particular transaction, given its id which may be master than traversing the entire sequence (although, highly depends how this is implemented by the mempool behind the scene).

  • 📍 Write client interface for the 'LocalTxMonitor' mini-protocol.

  • 📍 Write server interface for 'LocalTxMonitor' mini-protocol.

  • 📍 Add 'MsgRelease' to allow protocol to loop back to 'StIdle'
    And seemingly, allow client to send 'Done' even after acquiring.

  • 📍 Add 'ASCII-like' diagram illustrating the protocol state-machine.

  • 📍 Add codec for LocalTxMonitor protocol.

  • 📍 Implement local tx-monitor server using kernel's mempool.

  • 📍 Wire-up the new mini-protocol with the other Node-To-Client protocols.

  • 📍 Add property tests to cover LocalTxMonitor codecs
    Mostly following what's done for the other mini-protocols.

    Test suite test: RUNNING...
    ouroboros-network
      Ouroboros.Network.Protocol
        LocalTxMonitor
          codecM:                    OK
            +++ OK, passed 100 tests.
          codec 2-splits:            OK
            +++ OK, passed 100 tests.
          codec 3-splits:            OK
            +++ OK, passed 100 tests.
          codec cborM:               OK
            +++ OK, passed 100 tests.
          codec valid cbor encoding: OK
            +++ OK, passed 100 tests.
    
  • 📍 Add extra query to get mempool's size and capacity.

  • 📍 Adjust code style to team's conventions 💅

  • 📍 Add direct execution & property with example LocalTxMonitor client and server.

    Test suite test: RUNNING...
    ouroboros-network
      Ouroboros.Network.Protocol
        LocalTxMonitor
          direct:                    OK
            +++ OK, passed 100 tests.
    
  • 📍 Rename StBusyKind & TokBusyKind

    • StBusyNext -> NextTx
    • StBusyHas -> HasTx
    • StBusySizes -> GetSizes

    This reads better in type signatures when composed with other types of the protocol.

  • 📍 Add cddl specification (and tests) for LocalTxMonitor codecs.

  • 📍 Increment NodeToClient version and integrate LocalTxMonitor as client mini-protocol.

  • 📍 Add property tests for LocalTxMonitor via connected channels in IO and IO-sim

  • 📍 Change semantic (and rename) 'MsgReAcquire' to be blocking for updates
    'MsgAwaitRequire' now replaces 'MsgReAcquire' so that it only replies when there has been a change in the content of the mempool. This allows for clients to efficiently consume a snapshot and then, wait for the next one without having to poll for updates based on arbitrary times.

    Also, I took the opportunity to update the module's header and fix the state-machine diagram rendering in Haddock.

  • 📍 Add CHANGELOG entry about the LocalTxMonitor protocol.

  • 📍 Fix imports for moved module / code.

  • 📍 Add missing golden serialisation tests for 'GenTxId' & 'SlotNo' in NodeToClient.

@KtorZ KtorZ self-assigned this Oct 2, 2021
@KtorZ KtorZ force-pushed the KtorZ/local-tx-monitor branch from 70e37ad to 3b847a0 Compare October 2, 2021 15:11
@KtorZ KtorZ force-pushed the KtorZ/local-tx-monitor branch 3 times, most recently from 2a7cbd7 to f4e4f0b Compare October 5, 2021 16:36
@KtorZ KtorZ marked this pull request as ready for review October 6, 2021 16:06
@KtorZ KtorZ requested review from coot, karknu and nfrisby as code owners October 6, 2021 16:06
Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for such a nicely done PR. I pointed out a few style things -- nothing serious.

My only high-level thought is whether a (perhaps buggy) client could cause us problems here. But I'm inferring that the server can only ever hold onto one mempool snapshot at a time (per client), so that at least bounds the memory usage. And mempool snapshots are not that large relatively speaking (We should double-check this. A node could customly configure to allow for very large mempools, but still I see no incentive to do that to a degree that would endanger a resource outage from a LocalTxMonitor client. )

Can you attend the Consensus Planning Meeting next Tuesday (the 12th) at 15:00 Paris Time to discuss this? I'll add you to that event now; feel free to decline.

Copy link
Contributor

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice; a few comments follows.

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the style of protocol that you want?

If I've read and understood correctly what you have here is a snapshot style where you can take a mempool snapshot at any time, but you cannot wait for changes to the mempool. So it's a polling interface. How frequently do you plan to poll?

The sketch in the placeholder protocol was not a polling style: it let you ask for the next tx and have it block until a new tx arrived. Yes that version did not give consistent snapshots.

Note that either way, polling or blocking, you cannot guarantee to see every tx that flows through the mempool. It's always possible to be too slow and something can have been added and removed before you notice. In the polling snapshot design you'll miss things that are added and removed between snapshots. In the "next tx" design you'll miss things that are added to the end of the mempool and removed again before you get to that tx.

If you basically like the snapshot style (and sure, consistent snapshots are nice) can I suggest that you recover a blocking / non-polling style by changing MsgReAcquire so that it is a blocking operation: that it only replies when there has been a change in the content of the mempool.

@KtorZ
Copy link
Contributor Author

KtorZ commented Oct 8, 2021

@dcoutts

Are you sure this is the style of protocol that you want?

No. This is why I am asking for feedback here 😊

what you have here is a snapshot style [..] but you cannot wait for changes to the mempool.

Correct.

So it's a polling interface. How frequently do you plan to poll?

That's totally up to clients. The main use-case I see for this protocol is to control the state of transactions that one client insert in the mempool (not so much for those received from peers). From the discussions I had with various folks, they either want to (a) know how empty is the mempool to batch a few transactions (although I believe it should be sufficient to wait on the local-tx-submission when the mempool is full which automatically applies back pressure) and (b) be more efficient w.r.t to transaction re-submission. That is, if a transaction hasn't been seen on-chain and is no longer in the mempool, it should be resubmitted. Some are also just interested about the mempool for monitoring / observable; to better understand how the system behaves behind the scene.

So I imagine a typically poll frequency would be once every block.

Yes that version did not give consistent snapshots.

Yes, and it didn't also give you any means to know whether one transaction that you had seen in the past was still present in the mempool which felt quite limiting.

Note that either way, polling or blocking, you cannot guarantee to see every tx that flows through the mempool.

Indeed. I don't think there's any easy way around that because transactions may come from two sources anyway (local or peer submission). Either way, I don't think it's a problem to not have this guarantee; what's more interesting actually are transactions which are still in the mempool in the end.

If you basically like the snapshot style

I like the snapshot style, I find it pretty useful on the LSQ protocol to be able to reliably make a few queries on a consistent state. It feels like STM over the wire 😊

I suggest that you recover a blocking / non-polling style by changing MsgReAcquire so that it is a blocking operation: that it only replies when there has been a change in the content of the mempool.

Ho! That is a nice suggestion indeed! And quite easy to implement on the server side as well.

@KtorZ KtorZ force-pushed the KtorZ/local-tx-monitor branch 3 times, most recently from eda2e0e to 999773e Compare October 8, 2021 18:57
@dcoutts
Copy link
Contributor

dcoutts commented Oct 8, 2021

If you like the idea then it might be good to rename MsgReAcquire to indicate that it's about waiting for changes. I'd suggest the semantics be that it's a change vs the current snapshot rather than waiting for the next change since the point in time the message is received.

@KtorZ KtorZ force-pushed the KtorZ/local-tx-monitor branch from bde8158 to a284026 Compare October 10, 2021 11:40
Copy link
Contributor

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nfrisby
Copy link
Contributor

nfrisby commented Oct 12, 2021

Please also add an entry atop the https://github.com/input-output-hk/ouroboros-network/blob/master/ouroboros-consensus/docs/interface-CHANGELOG.md file. Thanks!

@nfrisby
Copy link
Contributor

nfrisby commented Oct 25, 2021

@KtorZ Would you post a status update here?

  • I recall some broader discussion of whether this is the appropriate solution for the underlying need. Something like: Were SPOs only hoping to run this monitor against their local node? IIRC, we decided this is suitable for the desired use-case.

  • There was also a question of how to test it.

Thanks!

KtorZ added 19 commits December 17, 2021 18:13
  I've added an extra step to acquire a particular mempool snapshot to better map the MemPool API from the consensus. This is a similar pattern to the Local State Query protocol, although it only acquires the latest mempool snapshot. Also added a query to check for a particular transaction, given its id which may be master than traversing the entire sequence (although, highly depends how this is implemented by the mempool behind the scene).
  Mostly following what's done for the other mini-protocols.

  ```
  Test suite test: RUNNING...
  ouroboros-network
    Ouroboros.Network.Protocol
      LocalTxMonitor
        codecM:                    OK
          +++ OK, passed 100 tests.
        codec 2-splits:            OK
          +++ OK, passed 100 tests.
        codec 3-splits:            OK
          +++ OK, passed 100 tests.
        codec cborM:               OK
          +++ OK, passed 100 tests.
        codec valid cbor encoding: OK
          +++ OK, passed 100 tests.
  ```
…d server.

    ```
    Test suite test: RUNNING...
    ouroboros-network
      Ouroboros.Network.Protocol
        LocalTxMonitor
          direct:                    OK
            +++ OK, passed 100 tests.
    ```
  * StBusyNext -> NextTx
  * StBusyHas  -> HasTx
  * StBusySizes -> GetSizes

  This reads better in type signatures when composed with other types of the protocol.
  'MsgAwaitRequire' now replaces 'MsgReAcquire' so that it only replies when there has been a change in the content of the mempool. This allows for clients to efficiently consume a snapshot and then, wait for the next one without having to poll for updates based on arbitrary times.

  Also, I took the opportunity to update the module's header and fix the state-machine diagram rendering in Haddock.
@KtorZ KtorZ force-pushed the KtorZ/local-tx-monitor branch from a419c70 to feff0a6 Compare December 17, 2021 18:09
@nfrisby
Copy link
Contributor

nfrisby commented Dec 17, 2021

Alrighty. I think the only obstacle before you can send the bors r+ or bors merge command is to satisfy the network team's style checker. You can see the diff in the CI failure. I think you can run it locally via nix-shell --run scripts/buildkite/check-stylish-network.sh -- but I don't have that file on my current checkout... so maybe this is a new addition?

@KtorZ
Copy link
Contributor Author

KtorZ commented Dec 17, 2021

I am currently running stylish haskell through the nix-shell indeed.. kicked it about 20 minutes ago but Nix decided to re-compile the whole world despite trusted keys 🤷 ... I'll push as soon as it's ready and trigger bors 👍

@KtorZ
Copy link
Contributor Author

KtorZ commented Dec 17, 2021

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Dec 17, 2021

@iohk-bors iohk-bors bot merged commit 32af916 into master Dec 17, 2021
@iohk-bors iohk-bors bot deleted the KtorZ/local-tx-monitor branch December 17, 2021 19:12
@nfrisby
Copy link
Contributor

nfrisby commented Dec 18, 2021

'grats @KtorZ! Thank you for your time and determination here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants