-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#1563 - Delete connections optionally on deletion of tcpListener/tcpConnector #1570
#1563 - Delete connections optionally on deletion of tcpListener/tcpConnector #1570
Conversation
src/adaptors/tcp/tcp_adaptor.c
Outdated
* | ||
* @param connections Reference to the list of connections | ||
*/ | ||
static void terminate_connections(qd_tcp_connection_list_t *connections) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a function called close_connection_XSIDE_IO()
which can be used to terminate tcp connections. It take care of freeing qd_tcp_connection_t
objects and well as its corresponding qdr_connection_t
objects
Like so -
static void terminate_listener_connections(qd_tcp_listener_t *listener) {
//
// Deliberately call qd_tcp_listener_incref() to make sure that freeing the connections, don't
// free the listener. Then we call qd_tcp_listener_free() to forcefully free the listener without checking the listener->ref_count
//
qd_tcp_listener_incref(listener);
qd_tcp_connection_t *conn = DEQ_HEAD(listener->connections);
while (conn) {
qd_tcp_connection_t *next_conn = DEQ_NEXT(conn);
close_connection_XSIDE_IO(conn);
conn = next_conn;
}
qd_tcp_listener_free(listener);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that close_connection_XSIDE_IO()
cannot be called directly in a loop because it makes proton calls on the connection. Another proactor thread may process the same connection at the same time which is prohibited. I maybe misunderstood this paragraph from threading-info.txt
:
The Proton subsystem limits concurrency to a single connection. That |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you did not misunderstand the paragraph. What I suggested - calling close_connection_XSIDE_IO() - from a management thread - is wrong. I will get back to you shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might try calling qdr_core_close_connection(conn->core_conn)
like here - https://github.com/skupperproject/skupper-router/blob/main/src/adaptors/http2/http2_adaptor.c#L3403
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the pointer! If I call that function then I do not need to change the type of the closed
flag from bool
to sys_atomic_t
in qdr_connection_t
. Is there any other reason why the current PR is not a proper solution? I'm just asking because I would like to understand the router code better :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I have updated to PR to use qdr_core_close_connection() for initiating the connection termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things
- You don't need to call
pn_raw_connection_wake()
since it is already called as part of the call toqdr_core_close_connection()
- You also don't need to lock on the activaton_lock since you are not going to call
pn_raw_connection_wake()
anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that I understand correctly the code, would the call sequence be roughly be as follows:
qdr_core_close_connection()
will activate the core thread- qdr_core_close_connection_CT() will set
conn->closed=true
and activate the connection - the TCP adaptor activation callback
CORE_activate()
will callpn_raw_connection_wake()
I think the check for conn->raw_opened
can also be removed from terminate_connections()
because the same check is done in CORE_activate()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qdr_core_close_connection(qdr_connection_t *conn)
will add a core thread action with callbackqdr_core_close_connection_CT()
and wake up the core thread- The
qdr_core_close_connection_CT()
is executed by the core thread and will set theconn->closed=true
and activate the connection to be deleted via a call toqdr_connection_activate_CT(core, conn);
qdr_connection_process()
in connections.c is executed on an I/O thread and an adaptor callback function is called (line 359 of connections.c) -conn->protocol_adaptor->conn_close_handler(conn->protocol_adaptor->user_context, conn, conn->error);
- The
conn_close_handler
is set here - https://github.com/skupperproject/skupper-router/blob/main/src/adaptors/tcp/tcp_adaptor.c#L2548 which isstatic void CORE_connection_close(void *context, qdr_connection_t *conn, qdr_error_t *error)
where there is a call topn_raw_connection_close(tcp_conn->raw_conn);
which will generate a PN_RAW_CONNECTION_DISCONNECTED proton event and the handler for that event ultimately callsclose_connection_XSIDE_IO(conn)
where theqd_tcp_connection_t
object and its contents are ultimately freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Also, this PR needs to include a system test using vanflow - |
cd1050a
to
7a7fb49
Compare
I'm working on this system test now (so I converted the PR to draft for now). Could you point me to an existing system test which could be a good starting point for me for this new test? |
Please take a look at #1566 and see if you understand how the vanflow_snooper.py works and let me know if you have any questions |
@gabordozsa Just wanted to mention that if you are working on a github issue, please make sure you assign the issue to yourself before you start working on it. That will let other developers know who is responsible for this issue, so they will not work on that issue. Also if the issue is already assigned to someone, and you want to work on it, please contact that person to see if they are not already working on it, if not, you can that it be assigned to you. Thanks. |
Sure, I will do! Thanks for the reminder and for assigning 1563 to me! |
7a7fb49
to
cfede6b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1570 +/- ##
=======================================
- Coverage 63.0% 62.5% -0.5%
=======================================
Files 214 214
Lines 51857 52308 +451
Branches 5986 6146 +160
=======================================
+ Hits 32681 32727 +46
- Misses 16966 17281 +315
- Partials 2210 2300 +90
Flags with carried forward coverage won't be shown. Click here to find out more.
|
src/adaptors/tcp/tcp_adaptor.c
Outdated
// | ||
if (connector->adaptor_config->terminate_conns) { | ||
qd_tcp_connection_t *conn; | ||
conn = DEQ_HEAD(listener->connections); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be conn = DEQ_HEAD(connector->connections)' not
conn = DEQ_HEAD(listener->connections)`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops, copy-paste issue.
cfede6b
to
7a8ffc7
Compare
7a8ffc7
to
d9ec438
Compare
@ganeshmurthy I have added a system test. I think the PR is ready for more reviews. |
|
||
# vflow Ids are used to associate flows with tcpListeners and tcpConnectors | ||
vflow_ids = {} | ||
self.assertTrue(self.router_1_vflow_id is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use self.assertIsNotNone
instead of self.assertTrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
'LISTENER', address) | ||
vflow_ids['listener_2'] = self.get_tcp_entity_vflow_id(self.router_2_vflow_id, | ||
'LISTENER', address) | ||
self.assertTrue(vflow_ids['listener_1'] is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, use self.assertIsNotNone
instead of self.assertTrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This test must also include another scenario where a TLS enabled |
To test the TLS case, you can use the |
pass | ||
return None | ||
|
||
def run_skmanage(self, cmd, router): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a class in system_test.py
called QdManager
that runs the skmanage
command for you. You can use that class instead of directly running skmanage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased and removed the run_skmanage() method.
806511d
to
16d5cb7
Compare
@gabordozsa I introduced a commit - f1f61ee - where I renamed the |
16d5cb7
to
9fea144
Compare
@gabordozsa This looks good to me, great work ! |
…y on deletion of tcpListener and tcpConnector Signed-off-by: Gabor Dozsa <gabor.dozsa@ibm.com>
Signed-off-by: Gabor Dozsa <gabor.dozsa@ibm.com>
…_REMOVE(conn), repectively
7122289
to
35d4d8b
Compare
@gabordozsa there seem to be some CI failures in this PR like this one - https://github.com/skupperproject/skupper-router/actions/runs/10354198908/job/28658907680?pr=1570 I have kicked off the main branch CI just to make sure that it is clean, let's see what happens on main branch CI. |
@gabordozsa The main CI seems fine I see this in the PR CI - https://github.com/skupperproject/skupper-router/actions/runs/10354198908/job/28658907680?pr=1570#step:39:600 |
@@ -66,7 +66,8 @@ qd_error_t qd_load_adaptor_config(qdr_core_t *core, qd_adaptor_config_t *config, | |||
config->site_id = qd_entity_opt_string(entity, "siteId", 0); CHECK(); | |||
config->ssl_profile_name = qd_entity_opt_string(entity, "sslProfile", 0); CHECK(); | |||
config->authenticate_peer = qd_entity_opt_bool(entity, "authenticatePeer", false); CHECK(); | |||
config->verify_host_name = qd_entity_opt_bool(entity, "verifyHostname", false); CHECK(); | |||
config->verify_host_name = qd_entity_opt_bool(entity, "verifyHostname", false); | |||
CHECK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is clang format causing this diff to appear ? Can you simply restore this file to its original version and not run the clang formatter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was done by clang format. I plan to squash all commits when the PR gets stabilised and restore this file then to its original.
Leak of |
@ganeshmurthy In the same error log I can see (https://github.com/skupperproject/skupper-router/actions/runs/10355435503/job/28663084819?pr=1570#step:34:4777):
I'm wondering if the problem could be in |
Good catch @gabordozsa . It looks like the on_accept() was called and the connection was created there but before we could get a PN_RAW_CONNECTION_CONNECTED event from proton, the router seems to be shutting down which leaves the newly created connection (which is not yet in the list) to leak. If you ask me, this should not really be a problem. The router is anyway shutting down, who cares about the leak ? But unfortunately, this is how we do things in the router. |
234585b
to
83c4988
Compare
Closing this as code got merged via #1601 |
Fixes #1563: Delete corresponding connections optionally on deletion of tcpListener and tcpConnector
New config parameters added to tcpListener and tcpConnector entities in order to make deletion of connections optional.