Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Aleth waits for 2 seconds after sending disconnect packet #5707

Merged
merged 4 commits into from
Aug 16, 2019

Conversation

twinstar26
Copy link
Contributor

Fixes #5650
Old PR #5658

@codecov-io
Copy link

codecov-io commented Jul 30, 2019

Codecov Report

Merging #5707 into master will increase coverage by 0.07%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5707      +/-   ##
==========================================
+ Coverage   63.06%   63.14%   +0.07%     
==========================================
  Files         353      353              
  Lines       30196    30193       -3     
  Branches     3386     3386              
==========================================
+ Hits        19044    19064      +20     
+ Misses       9918     9896      -22     
+ Partials     1234     1233       -1

@halfalicious
Copy link
Contributor

@twinstar26 Is this ready for review? If so, can you please request a review from myself / gumb0 / chfast?

@halfalicious halfalicious requested review from gumb0, chfast and halfalicious and removed request for gumb0 and chfast July 30, 2019 16:21
CHANGELOG.md Outdated
@@ -15,6 +15,7 @@
- Added: [#5634](https://github.com/ethereum/aleth/pull/5634) Bootnodes for Rinkeby and Goerli.
- Added: [#5640](https://github.com/ethereum/aleth/pull/5640) Istanbul support: EIP-1702 Generalized Account Versioning Scheme.
- Added: [#5690](https://github.com/ethereum/aleth/issues/5690) Istanbul support: EIP-2028 transaction data gas cost reduction.
- Added: [#5650](https://github.com/ethereum/aleth/issues/5650) Aleth should wait for 2 seconds to close socket after sending disconnect to peer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog should describe the behavior with your fix applied (e.g. "Aleth now waits for 2 seconds after sending disconnect to peer before closing socket")

{
RLPStream s;
prep(s, DisconnectPacket, 1) << (int)_reason;
sealAndSend(s);
auto disconnectTimer = m_server->createTimer();
Copy link
Contributor

@halfalicious halfalicious Jul 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session::disconnect will still close the socket immediately (Session::drop closes the socket and you're calling it in both your timer handler and outside the handler if it hasn't already been called i.e. !m_dropped). Since what we want to do is delay the socket closure for 2 seconds, you should move the code which closes the socket out of Session::drop and into the timer handler and also not call drop inside of the timer handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halfalicious
Thanks for the reviews!
Okay so I have to copy required logic of drop into disconnect and never call drop from inside of disconnect and its handler. Am I right??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move the socket closure code out of drop and into the handler. Don’t call drop inside of the handler but still call it in your else statement in disconnect (since we want to prevent further session operations after we have sent the disconnect, you can see that we check for m_drop in various places in session and return early)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halfalicious
So pushed changes according to your suggested changes. But was curious what will happen to code thats calling drop() (apart from disconnect) separately??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't look right:

  1. Changing drop() now breaks all the cases where it is called from other methods of Session. I guess those are reasonable cases where we wanted to just disconnect immediately without sending Disconnect packet (with current changes we won't disconnect)

  2. Calling disconnect now doesn't ever set m_dropped to true. @halfalicious Maybe you meant calling current form of drop from inside if (!m_dropped) clause?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 You're right we need to also call drop() inside of the !m_dropped clause.

I don't think this new form of drop breaks anything though...even if drop doesn't close the socket all shared_ptr Session instances will still go out of scope and therefore Session will be garbage collected by Host and the socket will be closed when the dtor is invoked. Note that functions in which we pass new shared_ptr instances (shared_from_this) to handlers check if (m_dropped) and then prematurely terminate before creating new handlers (e.g. doRead). In fact, after taking a closer look at the Session code, I don't think we need to explicitly close the socket at all in any place other than the dtor (unless there's some benefit to closing the socket sooner rather than later that I'm not aware of)? If we just call drop() without closing the socket the Session instance will be destroyed once all queued handlers which contain a shared_ptr instance are executed and the Host garbage collects the Session.

So, to summarize:

  • I think we should call drop() in both the if (!m_dropped) and else cases...in fact, we can simplify this by removing the "else" case and always calling drop() after the if (!m_dropped) code block since drop returns early if m_dropped is true.
  • I think we can remove all cases where we close the socket except for in the dtor.

@gumb0 : Thoughts?

Copy link
Member

@gumb0 gumb0 Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to explicitly close the socket at all in any place other than the dtor (unless there's some benefit to closing the socket sooner rather than later that I'm not aware of)?

@halfalicious The benefit I see is that we immediately stop answering any further requests on this connection, not waiting for all handlers to terminate, so maybe it's more efficient...

Also, what if some malicious peer keeps DOSing us with some invalid requests on this connection - it will be kept in the handler queue (because the new data keeps coming) and never disconnected, it seems so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halfalicious pointed out to me:

I think that we will still stop reading packets from a peer if we call drop without closing the socket since we check for m_dropped at the beginning of Session::doRead before proceeding with the read:

aleth/libp2p/Session.cpp

Lines 318 to 322 in d8f4498

void Session::doRead()
{
// ignore packets received while waiting to disconnect.
if (m_dropped)
return;

I think I agree with this reasoning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One corner case to check: spec requires that we disconnet immediately upon receiving Disconnect packet ourselves. My experiment confirms that this looks to be working correctly with current changes in this PR:

TRACE 08-08 11:54:24 p2p  p2pcap Disconnect from (##a8768590…@51.141.78.53:30303)
DEBUG 08-08 11:54:24 p2p  p2pcap Disconnect (reason: Peer had too many connections.) from (##a8768590…@51.141.78.53:30303)
DEBUG 08-08 11:54:24 p2p  net    Closing peer session with (##a8768590…@51.141.78.53:30303)

Copy link
Contributor

@halfalicious halfalicious Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One corner case to check: spec requires that we disconnet immediately upon receiving Disconnect packet ourselves. My experiment confirms that this looks to be working correctly with current changes in this PR:

TRACE 08-08 11:54:24 p2p  p2pcap Disconnect from (##a8768590…@51.141.78.53:30303)
DEBUG 08-08 11:54:24 p2p  p2pcap Disconnect (reason: Peer had too many connections.) from (##a8768590…@51.141.78.53:30303)
DEBUG 08-08 11:54:24 p2p  net    Closing peer session with (##a8768590…@51.141.78.53:30303)

@gumb0 Confirmed by looking at the code - when we receive a disconnect packet we call drop immediately in Session::interpretP2pPacket:

aleth/libp2p/Session.cpp

Lines 142 to 154 in 78355a2

case DisconnectPacket:
{
string reason = "Unspecified";
auto r = (DisconnectReason)_r[0].toInt<int>();
if (!_r[0].isInt())
drop(BadProtocol);
else
{
reason = reasonOf(r);
LOG(m_capLogger) << "Disconnect (reason: " << reason << ") from";
drop(DisconnectRequested);
}
break;

RLPStream s;
prep(s, DisconnectPacket, 1) << (int)_reason;
sealAndSend(s);
auto disconnectTimer = m_server->createTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this timer will be destroyed right after this function ends, and outstanding wait operation will be cancelled. (Probably should be made a member?)

Copy link
Contributor

@halfalicious halfalicious Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 Agreed, in my prototype I had captured the timer in the handler but making it a member works as well (and also enables us to make it unique_ptr rather than shared_ptr as I see you've done) 😄

@gumb0
Copy link
Member

gumb0 commented Aug 8, 2019

I moved the calls to timer's member inside createTimer() function to hide these lower-level details from Host's users and for better symmetry between postWork and createTimer

@gumb0
Copy link
Member

gumb0 commented Aug 8, 2019

Rebased.

m_disconnectTimer = m_server->createTimer(
std::chrono::seconds(2), [self, this, _reason](boost::system::error_code) {
bi::tcp::socket& socket = m_socket->ref();
if (socket.is_open())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I still don't like here is that we have the same code (closing the socket) here in the timer handler and then in destructor. And destructor is executed anyway, so perhaps we could avoid this duplication, maybe just posting empty handler? @halfalicious what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 Yes I agree with this, in fact I suggested this earlier in this PR 😄

@halfalicious
Copy link
Contributor

Confirmed that we wait for 2 seconds now:

TRACE 08-12 21:33:11 p2p  p2pcap Disconnecting (our reason: Peer had no use for this node.) from (##03957ecc…@51.75.127.66:30303)
TRACE 08-12 21:33:11 p2p  net    Disconnect to (##03957ecc…@51.75.127.66:30303)
...
DEBUG 08-12 21:33:13 p2p  net    Closing peer session with (##03957ecc…@51.75.127.66:30303)

@halfalicious
Copy link
Contributor

@gumb0 I think this is ready to be merged, please give it a quick look when you get a chance

libp2p/Host.h Outdated
@@ -266,6 +266,11 @@ class Host: public Worker

std::shared_ptr<CapabilityHostFace> capabilityHost() const { return m_capabilityHost; }

/// Execute work on the network thread after an @_expiryDelay delay.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was doxygen syntax to refer to the parameter inside description
(http://www.doxygen.nl/manual/commands.html#cmda)

Suggested change
/// Execute work on the network thread after an @_expiryDelay delay.
/// Execute work on the network thread after an @a _expiryDelay delay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 That's what I suspected but I saw different syntax used in other parts of the project e.g.:

aleth/libethereum/Account.h

Lines 150 to 152 in 78355a2

/// @returns account's original storage value corresponding to the @_key
/// not taking into account overlayed modifications
u256 originalStorageValue(u256 const& _key, OverlayDB const& _db) const;

Closing the socket in the handler is unnecessary since it will be closed
when the dtor executes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aleth should wait 2 seconds to close socket after sending disconnect to peer
4 participants