-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vhost selection by sticky sessions #1266
Conversation
tempesta_fw/http.c
Outdated
@@ -3353,6 +3353,20 @@ tfw_http_req_process(TfwConn *conn, const TfwFsmData *data) | |||
TFW_INC_STAT_BH(clnt.msgs_otherr); | |||
} | |||
|
|||
if (req->sess) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req
must not be accessed after tfw_http_send_resp()|tfw_cache_process()
. Request can be sent out there or even destroyed.
Saving vhost
value should take place right after vhost is determined for the first time (same function, line 3219):
- if (!req->vhost)
+ if (!req->vhost) {
req->vhost = tfw_http_tbl_vhost((TfwMsg *)req, &block);
+ tfw_http_sess_pin_vhost(req);
+ }
if (req->vhost)
req->location = tfw_location_match(req->vhost,
&req->uri_path);
And I would like to move session-specific operations to `http_sess.c_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved code to http_sess.c
. Since tfw_http_sess_pin_vhost()
is now part of API for sessions and can be used by some other code, it now also checks whenever sess->vhost
was non-NULL to track references to vhost
more accurately. Also removed outdated comments that I forgot to remove in the initial submission.
acc6c36
to
7b5c049
Compare
Changed code to address review comments. However there is still functional tests failure — investigating that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I am agree with the PR, but I have a little concerns about grace shutdown mode. If there is no such issue, the PR can be merged.
Here in the possible bad pointer dereference scenario:
- Vhost and related server groups (main and backup) are removed from the configuration, grace shutdown is enabled, sticky session failovering is enabled.
- All servers from primary (main) server group has finished grace shutdown routine, i.e. servers and server group structures was expired. But some servers from backup group still has active sessions.
- Suddenly one of the backup servers closes all connections. Since the server on the grace shutdown, we don't reconnect to it.
- A new request is received in a session that is pinned to that server.
- We try to find a new matching connection:
tempesta/tempesta_fw/http_sess.c
Line 1141 in 105a766
srv_conn = tfw_vhost_get_srv_conn(msg);
vhost stored in the request - is vhost from the previous configuration. But main server group was already destroyed, thusvhost->main_sg
becomes invalid.
Probable solution: expire session if the server was removed from the configuration even if the session failovering is enabled.
tempesta/tempesta_fw/http_sess.c
Lines 1124 to 1128 in 105a766
if (unlikely(test_bit(TFW_CFG_B_DEL, &srv->flags))) { | |
TFW_LOG("sticky sched: server %s" | |
" was removed, set session expired\n", | |
addr_str); | |
tfw_http_sess_set_expired(sess); |
f5ccca0
to
ff8d374
Compare
There was a bug in the previous version of the patchset: reference to a vhost was stored in a session, which through indirection held references to server groups. After configuration reload that references were keeping server group structures in zombie state — not dead, but also not functional, since schedulers were removed from them. That caused The issue was resolved by introducing a flag into TfwVhost. It's set during reconfiguration (in |
@ikoveshnikov, What may happen is zombie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except one minor notice.
tempesta_fw/vhost.h
Outdated
@@ -128,6 +128,10 @@ enum { | |||
TFW_D_CACHE_PURGE_INVALIDATE, | |||
}; | |||
|
|||
enum { | |||
TFW_VHOST_B_INVALID = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would request another name here. Something marked as 'invalid' usually means that something is wrong with the internal state object. 'Invalid' objects cant pass configuration phase and exist in running configuration. Name like TFW_VHOST_B_REMOVED
sound more accurate and it's very clear why this instance exist and why it's not usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to TFW_VHOST_B_REMOVED
. It does look better, indeed.
tempesta_fw/http_sess.c
Outdated
write_lock(&sess->lock); | ||
sess->vhost = NULL; | ||
write_unlock(&sess->lock); | ||
tfw_vhost_put(cached_vhost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like suspicious optimisation. Same HTTP session can appear in multiple requests that can be processed on multiple CPUs at the same time. Imagine the situation: two threads are taking the read lock and find out that the vhost is removed; then both decides to drop the reference; since the reference is cached during read lock the reference would be dropped twice, causing use-after-free errors. I suggest following change:
- TfwVhost *cached_vhost = sess->vhost;
read_unlock(&sess->lock);
write_lock(&sess->lock);
- sess->vhost = NULL;
+ if (sess->vhost) {
+ tfw_vhost_put(sess->vhost);
+ sess->vhost = NULL;
+ }
write_unlock(&sess->lock);
- tfw_vhost_put(cached_vhost);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed code according to your suggestion. Thanks!
|
||
tfw_vhost_get(sess->vhost); | ||
req->vhost = sess->vhost; | ||
read_unlock(&sess->lock); | ||
goto found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seemed strange to me here, I was thinking about possible configuration updates like scheduler changes. vhost name changes and other changes that can spoil moving session from old vhost instance to a new one. I took me some time to recheck this, but everything looks good here.
TfwStickyConn lost its members over the time, and now there are only two: a pointer to TfwSrvConn and a lock to protect that pointer.
Caching vhost in the session helps to skip vhost selection procedure which may be quite costly.
ff8d374
to
7450d7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
The patchset adds
vhost
caching in HTTP sessions. We now savevhost
after the first request in a session is processed, and then reuse it for other requests in the session, thus avoiding the selection procedure which may be quite costly.(fixes #1043)