Storing client data in tdb #1145

avbelov23 · 2019-01-10T20:28:10Z

Requires #1115

Changes:

Transfer tfw_current_timestamp() to library
Keep client data in TDB (client_db and client_tbl_size options) after the last client connection is closed and use their some time (client_lifetime option)

Fix #903: add logging for all error responses.

fix wrk command

Fix releasing of requests and responses on client disconnects

Resolve issues from static analyzer report.

…pair of the freed request

Fix #971

Update changelog

Test control

Fix #978: mark started flag as false after failed start on reload.

Return 1 for failed unit tests

tfw_http_msg_alloc_resp_light() is inlined function and overriding return code is not possible for it. Set stap for __tfw_http_msg_alloc() function and check for arguments in order to override return only for tfw_http_msg_alloc_resp_light.

Message parsing in deproxy was improved and garbage after headers end no more treated as body. Manually append garbage to string representation of the message.

…PR#983).

Fix some issues in functional tests

Parse "pragma" field in response, honor "pragma" field in request

reorder enum a bit in http parser

Fix #1034: Counting of client objects to avoid use-after-free case.

…limit relax Content-Length limit from UINT_MAX to ULONG_MAX

Fix #1019: Unload Tempesta modules after failed start.

Fix bugs affecting load balancers

fix a typo in the http_parser comment

Fix various misspells

vankoven

The PR is not fully done and contain race conditions, it can't be merged in current state.

vankoven · 2019-01-11T07:13:25Z

lib/common.h

+/**
+ *		Tempesta kernel library
+ *
+ * Copyright (C) 2015-2018 Tempesta Technologies, Inc.


Bad copy-paste. The file is just created, year 2019 must be stated, not range of 2015-2018.

vankoven · 2019-01-11T07:16:38Z

tempesta_fw/http.h

@@ -29,6 +29,7 @@
 #include "server.h"
 #include "str.h"
 #include "vhost.h"
+#include "lib/common.h"


http.h is big and messy header as for me. Please include the lib/common.h header only at translation units where it's really required: http.c, cache.c, client.c.

vankoven · 2019-01-11T07:52:58Z

etc/tempesta_fw.conf

+
+# TAG: client_lifetime
+#
+# Client life time in seconds. This is the time during which the clients with


Not very accurate description. 'Life time of client accounting data after last client connection was closed. The accounting data is used for Frang limits.`

UPD: Ah, I see in the tfw_client_obtain() that you actually mean client_lifetime to be lifetime of the TfwClient structure from the creation. This is incorrect and buggy behaviour, please see my comment in tfw_client_obtain().

vankoven · 2019-01-11T08:12:55Z

etc/tempesta_fw.conf

+#   client_lifetime 3600;
+#
+# Default:
+#   client_lifetime 0;


I know, that it was copied from sess_lifetime which also has 0 as default value. But both seems to be dangerous defaults and must be redefined in wild life installations. Any thoughts?

Need to choose some reasonable value

vankoven · 2019-01-11T09:04:40Z

tempesta_fw/client.c

+		ent = (TfwClientEntry *)iter.rec->data;
+		cli = &ent->cli;
+		if (!memcmp_fast(&cli->addr, &addr, sizeof(cli->addr)) &&
+			curr_time < ent->expires) {


ent->expires is initialized at TfwClient entry initialisation, and you skip expired entries here. This behaviour can be correct but it's not secure.

client_lifetime is configured to, say, 5 seconds.

bot opens a connection

a new TfwClient entry is created during tfw_client_obtain()

bot waits 6 seconds

bot opens a new connection

now bot has two TfwClient entries associated, so frang limits are two times higher for the bot.

bot can open as many connections as he wants and pass all the frang limits.

Setting bigger client_lifetime limits doesn't fix the problem. It just require more time to exploit the vulnerability.

We shouldn't treat TfwClient as expired till it has live connections (TfwClient->conn_users != 0). Expiration time should be evaluated only after all the connections was closed. Thus we need to rearm expiration timestamp on last tfw_client_put() operation.

vankoven · 2019-01-11T10:03:03Z

tempesta_fw/client.c

+	spin_lock_bh(&cli->conn_lock);
+
+	if (list_empty(&cli->conn_list))
+		TFW_INC_STAT_BH(clnt.online);


We've discussed it int the chat yesterday, but we missed race condition here. Connection is added into conn_list only when socket comes to established state, while a new TfwClient instance can be created during frang callbacks. When client spawns multiple connections, race may happen and clnt.online can be increased more than once.

vankoven · 2019-01-11T10:22:41Z

tempesta_fw/client.c

 		return NULL;
 	}

+	len = sizeof(*cli);
+	rec = tdb_entry_alloc(ip_client_db, key, &len);
+	if (!rec) {


Unnecessary curly brace.

vankoven · 2019-01-11T10:22:58Z

tempesta_fw/client.c

 		return NULL;
 	}

+	len = sizeof(*cli);
+	rec = tdb_entry_alloc(ip_client_db, key, &len);


Hm, on previous review I've missed race condition here. Here is the scenario:

Client spawns more than one connections simultaneously

tfw_client_obtain() called simultaneously for conn1 and conn2

Lookup failed for both conn1 and conn2

Two different TfwClient instances will be created in the same TDB bucket.

Probably we need a new interface for TDB, which will find existing entry or create a new one with one action (without releasing intermediate TDB locks). Any thoughts?

On previous review was locks. You said to remove them

That was old locks used, when TfwClients were stored in hash table. That locks (hashtable of spinlocks) wasn't synchronised with TDB and it looked like a workaround standing aside. Every time a new entry was created TDB, spinlock from the hashtable was picked and locked, then TDB search was performed, which also use some locking.

I would like to see more elegant solution. Unlike to cache.c, in client.c we still need to insert a new entry if TDB lookup failed. Having TDB interface capable to do this in one action seems to be cleaner solution.

@krizhanovsky had more ideas on future TDB evolution, let's involve him in this discussion.

I agree with @ikoveshnikov about hash of locks cli_hash looks fishy and we don't need it because TDB provides internal synchronization for parallel selectors & updaters and/or synchronization patterns, so please remove cli_hash.

The race occurs because we do 2 operations, lookup and insert if the lookup fails, while we need the operations coupled and atomic. This is a frequent patterm, e.g in #500 we need to lookup and insert in-progress cache entry. Actually tdb_htrie_insert() was written in this assumption and it already has some code for lookup and insert on failure.

For now just please a new function tdb_rec_get_alloc() which call tdb_rec_get() and tdb_entry_alloc() if the former fals under one global spinlock (it could be even system wide, not per table). I add a requirement to implement the function properly in #515 and now we need the lock just not to crash on simple scenarios.

vankoven · 2019-01-11T10:25:07Z

tempesta_fw/client.c

-		TfwClient *c;
-		CliHashBucket *hb = &cli_hash[i];
+	int r = 0;
+	/*TfwClient *c;


Dead code is commented. Without the function, PR can't be merged into master, since closing active client connections won't be performed on shutdown.

Yes, and IIRC we have discussed that this logic, all the entries traversal, should be implemented in this PR on TDB side. However, tfw_client_for_each() is called only for all clients destruction on Tempesta FW shutdown and this is not what we actually want to have now: Tempesta may shutdown in many reasons, e.g. a server maintenance, while clients have expire timeouts and can leave independently on the system restarts. I.e. if we have a new client with expire time 5 minutes, then why should we delete the client record on the system reset if we just created the record? So I believe we should just remove the whole client deletion logic now and leave the deletion for garbage collection in #515.

This code is not associated with deleting clients. It is associated with the closing of all connections of all clients. Without going through all the records and closing of all connections will crash. Need tdb_for_each_rec . @krizhanovsky said on December 26th in slack that he would do the function yourself in #515

vankoven · 2019-01-11T10:26:13Z

tempesta_fw/client.c

-	int i;
+	if (tfw_runstate_is_reconfig())
+		return;
+	if (ip_client_db) {


Unnecessary curly brace.

vankoven · 2019-01-11T12:00:55Z

tempesta_fw/client.c

 	if (init)
 		init(cli);

-	atomic64_inc(&act_cli_n);


The variable was required for tfw_cli_wait_release().

avbelov23 · 2019-01-11T13:42:05Z

DONE

krizhanovsky

The pull request is incomplete and significant adjustments are required.

krizhanovsky · 2019-01-18T14:26:41Z

tempesta_fw/client.c

 		return NULL;
 	}

+	len = sizeof(*cli);
+	rec = tdb_entry_alloc(ip_client_db, key, &len);


I agree with @ikoveshnikov about hash of locks cli_hash looks fishy and we don't need it because TDB provides internal synchronization for parallel selectors & updaters and/or synchronization patterns, so please remove cli_hash.

The race occurs because we do 2 operations, lookup and insert if the lookup fails, while we need the operations coupled and atomic. This is a frequent patterm, e.g in #500 we need to lookup and insert in-progress cache entry. Actually tdb_htrie_insert() was written in this assumption and it already has some code for lookup and insert on failure.

For now just please a new function tdb_rec_get_alloc() which call tdb_rec_get() and tdb_entry_alloc() if the former fals under one global spinlock (it could be even system wide, not per table). I add a requirement to implement the function properly in #515 and now we need the lock just not to crash on simple scenarios.

krizhanovsky · 2019-01-18T14:34:35Z

lib/common.h

+ * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#ifndef __COMMON_H__
+#define __COMMON_H__


I'm not sure that we need a new file for just this 3-lines function, usually we just copy and paste such small code. However, I don't mind too much about this and if @ikoveshnikov and @aleksostapenko fine with the file, I'm fine too.

If you leave the file then just us __LIB_COMMON_H__ - we use LIB prefix around /lib/ files.

I guess, new file may be left - for similar common stuff in future.

krizhanovsky · 2019-01-18T14:37:07Z

tempesta_fw/client.c

-		HLIST_HEAD_INIT,
-	}
-};
+CliHashBucket cli_hash[CLI_HASH_SZ];


Please get rid of all the hash rudiments.

krizhanovsky · 2019-01-18T14:38:45Z

tempesta_fw/client.c

+typedef struct {
+	time_t expires;
+	TfwClient cli;
+} TfwClientEntry;


I see that cache_cfg members are unaligned, but in recent code we align structure members with tabs.

krizhanovsky · 2019-01-18T14:41:10Z

tempesta_fw/client.c

@@ -58,26 +67,18 @@ static atomic64_t act_cli_n = ATOMIC64_INIT(0);
 void


Please add TODO #515 employ eviction strategy for the table so that we won't forget about the place. We'll need to move expires member of TfwClientEntry to a TDB record to allow TDB to perform eviction internally.

krizhanovsky · 2019-01-18T14:46:43Z

tempesta_fw/client.c

-	hlist_del(&cli->hentry);
-
-	spin_unlock(cli->hb_lock);
+	ent = (TfwClientEntry *)((void *)cli - offsetof(TfwClientEntry, cli));


I'd say that TfwClientEntry inherits from TfwClient, so cli should be the first member of TfwClientEntry and we don't need the offsetof expression here now and won't need to update the code when we move expires to TDB-internal entry descriptors.

krizhanovsky · 2019-01-18T14:49:15Z

tempesta_fw/client.c

-	if (!(cli = kmem_cache_alloc(cli_cache, GFP_ATOMIC | __GFP_ZERO))) {
-		spin_unlock(&hb->lock);
+	if (unlikely(!ss_active())) {
+		TFW_DBG("reject allocation of new client after shutdown\n");
 		return NULL;
 	}


Requirements from #1115

Next, the hash key for searching a client must be calculated by User-Agent plus IP address, otherwise, if no User-Agent, only IP address. Currently, we use netowork IP address as the address of a client, however if a client work through a forward proxy, then a proxy can pass it's IP address by the first item in X-Forwarded-For, so if the header is present, then we shall reinsert the TfwClient in TDB with different key. The reinsert operation must be implemented on TDB layer as a new routine tdb_entry_reinsert() accepting current and new keys. The function must call tdb_htrie_insert(), copying the data from the previous location, and a new empty tdb_htrie_delete() left as TODO for #515.

aren't implemented.

krizhanovsky · 2019-01-18T14:54:51Z

tempesta_fw/client.c

-		TfwClient *c;
-		CliHashBucket *hb = &cli_hash[i];
+	int r = 0;
+	/*TfwClient *c;


Yes, and IIRC we have discussed that this logic, all the entries traversal, should be implemented in this PR on TDB side. However, tfw_client_for_each() is called only for all clients destruction on Tempesta FW shutdown and this is not what we actually want to have now: Tempesta may shutdown in many reasons, e.g. a server maintenance, while clients have expire timeouts and can leave independently on the system restarts. I.e. if we have a new client with expire time 5 minutes, then why should we delete the client record on the system reset if we just created the record? So I believe we should just remove the whole client deletion logic now and leave the deletion for garbage collection in #515.

krizhanovsky · 2019-01-18T15:02:11Z

tempesta_fw/client.c

+}
+
+static int
+tfw_client_lifetime(TfwCfgSpec *cs, TfwCfgEntry *ce)


We use tfw_cfgop_ prefixes for configuration handlers

aleksostapenko · 2019-01-22T11:32:39Z

tempesta_fw/client.c

 	TFW_DBG("new client: cli=%p\n", cli);
 	TFW_DBG_ADDR("client address", &cli->addr, TFW_NO_PORT);

 found:
+	if (!atomic_read(&cli->conn_users)) {
+		atomic64_inc(&act_cli_n);
+		TFW_INC_STAT_BH(clnt.online);


Why we need to do this after found:, and not before? It seems that before the found: - there is no need in additional check of cli->conn_users.

The client now appears online when it is allocated or got from tdb and conn_users is zero

Yeah, missed that.

aleksostapenko · 2019-01-22T12:25:55Z

tempesta_fw/client.c

 	TFW_DBG("new client: cli=%p\n", cli);
 	TFW_DBG_ADDR("client address", &cli->addr, TFW_NO_PORT);

 found:
+	if (!atomic_read(&cli->conn_users)) {
+		atomic64_inc(&act_cli_n);


The act_cli_n counter (as well as the whole tfw_cli_wait_release() function) had been added to resolve the #1034 issue (explanatory comment is contained here #1118 (comment) in p. 3). Since client deletion logic is removed in this PR and clients instances are persistent entries in TDB for now - the functionality for control of clients releasing became redundant.
Besides, it seems that in case of eviction of clients instances from TDB (in context of future #515 implementation) - that functionality is not needed too: if evict only expired clients, which in turn became expired only after their reference counts became zero.
So, act_cli_n and tfw_cli_wait_release() can be safely removed in this PR.

aleksostapenko and others added 30 commits March 20, 2018 02:30

Merge pull request #952 from tempesta-tech/ao-903

723f781

Fix #903: add logging for all error responses.

Merge pull request #966 from tempesta-tech/vlad-fix-wrk

5734131

fix wrk command

Don't hung in sg release wait loop forever

e586ce8

Assure grace shutdown timer is stopped on server destroying

2065d1d

Remove request eviction without delisting from fwd_queue

59f7197

Check that the paired response was destroyed before request

9e44375

Fix #959 free response-request pairs on client connection drop

ae107da

Resolve issues from static analyzer report.

d038360

Fix code review comments

bc7f686

Merge pull request #969 from tempesta-tech/ik-fix-pending-for-srv-issue

1717c14

Fix releasing of requests and responses on client disconnects

Corrections according to review comments (PR#970).

fa7583f

Merge pull request #970 from tempesta-tech/ao-coverity

8d9dc49

Resolve issues from static analyzer report.

Fix #971: more the assertion to the places where we are unsure about …

5756d54

…pair of the freed request

Merge pull request #972 from tempesta-tech/ak-971

a35399c

Fix #971

Update changelog

eb2caf3

Return code if failures

a4d457e

Merge pull request #974 from tempesta-tech/release-0.5.0

5fa25d9

Update changelog

Merge pull request #968 from tempesta-tech/vlts-test-module

5c65e8a

Test control

Fix #978: mark started flag as false after failed start on reload.

30ffca4

Zero out hash keys to correct groups lookup procedure (PR#979).

0b0af2b

Merge pull request #979 from tempesta-tech/ao-978

6e8b616

Fix #978: mark started flag as false after failed start on reload.

Return 1 for failed unit tests

221bf79

Merge pull request #982 from tempesta-tech/vlad-unittest

479fc5d

Return 1 for failed unit tests

Fix #918: release connection if its' reference count is zero.

108c207

func tests: predefined responses now has Server header, add to tests

56865e5

func tests: replace message update with simple append

550fbdb

Message parsing in deproxy was improved and garbage after headers end no more treated as body. Manually append garbage to string representation of the message.

Add TFW_CONN_B_ACTIVE flag and fixes for appending connections case (…

ba2bd25

…PR#983).

Merge pull request #986 from tempesta-tech/ik-fix-tests

6e65b75

Fix some issues in functional tests

Fix cfg misprint

2e7c33a

i-rinat and others added 17 commits November 28, 2018 23:43

Merge pull request #1120 from tempesta-tech/ri-pragma-no-cache

f7ebc6f

Parse "pragma" field in response, honor "pragma" field in request

reorder enum a bit in http parser

6340290

Merge pull request #1124 from tempesta-tech/ri-follow-up-for-1120

b896eea

reorder enum a bit in http parser

Changes according review comments (#1034).

e5d20e6

Merge pull request #1118 from tempesta-tech/ao-1034

846cdec

Fix #1034: Counting of client objects to avoid use-after-free case.

relax Content-Length limit from UINT_MAX to ULONG_MAX

24a4f94

Merge pull request #1123 from tempesta-tech/ri-larger-content-length-…

96381d8

…limit relax Content-Length limit from UINT_MAX to ULONG_MAX

Fix #1019: Unload Tempesta modules after failed start.

ac77660

Merge pull request #1126 from tempesta-tech/ao-1019

e8644a2

Fix #1019: Unload Tempesta modules after failed start.

replace logical AND wit bitwise AND

dfa99e2

apm: after apm init min_val is always 0 and never updated

ac03fd5

Merge pull request #1127 from tempesta-tech/ik-sched-fixes

0cfcee2

Fix bugs affecting load balancers

fix a typo in the http_parser comment

8555feb

Merge pull request #1130 from tempesta-tech/ri-fix-typo

b59da84

fix a typo in the http_parser comment

Correct spelling mistakes

4005c31

Update copyrights of changed files

6951859

Merge pull request #1132 from tempesta-tech/ik-fix-misspells

ae921a0

Fix various misspells

avbelov23 force-pushed the ab-client_tdb branch from 22eb2f7 to c002692 Compare January 11, 2019 06:29

vankoven suggested changes Jan 11, 2019

View reviewed changes

vankoven reviewed Jan 11, 2019

View reviewed changes

tempesta_fw/client.c

if (init)

init(cli);

atomic64_inc(&act_cli_n);

Copy link

Contributor

vankoven Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable was required for tfw_cli_wait_release().

transfer tfw_current_timestamp() in the library

4160e0b

avbelov23 force-pushed the ab-client_tdb branch from c002692 to 5520cb2 Compare January 11, 2019 13:40

storing client data in tdb and use their given time

941dbdb

avbelov23 force-pushed the ab-client_tdb branch from 5520cb2 to d93076c Compare January 11, 2019 21:34

krizhanovsky requested changes Jan 18, 2019

View reviewed changes

aleksostapenko reviewed Jan 22, 2019

View reviewed changes

krizhanovsky closed this Jan 26, 2019

krizhanovsky force-pushed the ab-client_tdb branch from d93076c to 941dbdb Compare January 26, 2019 22:04

avbelov23 mentioned this pull request Feb 7, 2019

Temporal client accounting #1178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing client data in tdb #1145

Storing client data in tdb #1145

avbelov23 commented Jan 10, 2019

vankoven left a comment

vankoven Jan 11, 2019

vankoven Jan 11, 2019

vankoven Jan 11, 2019

vankoven Jan 11, 2019

avbelov23 Jan 11, 2019

vankoven Jan 11, 2019

vankoven Jan 11, 2019

vankoven Jan 11, 2019

vankoven Jan 11, 2019

avbelov23 Jan 11, 2019 •

edited

Loading

vankoven Jan 11, 2019

krizhanovsky Jan 18, 2019

vankoven Jan 11, 2019

krizhanovsky Jan 18, 2019

avbelov23 Jan 31, 2019 •

edited

Loading

vankoven Jan 11, 2019

vankoven Jan 11, 2019

avbelov23 commented Jan 11, 2019

krizhanovsky left a comment

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

aleksostapenko Jan 29, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

krizhanovsky Jan 18, 2019

aleksostapenko Jan 22, 2019

avbelov23 Jan 29, 2019

aleksostapenko Feb 12, 2019

aleksostapenko Jan 22, 2019

		@@ -58,26 +67,18 @@ static atomic64_t act_cli_n = ATOMIC64_INIT(0);
		void

Storing client data in tdb #1145

Storing client data in tdb #1145

Conversation

avbelov23 commented Jan 10, 2019

vankoven left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avbelov23 Jan 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avbelov23 Jan 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avbelov23 commented Jan 11, 2019

krizhanovsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avbelov23 Jan 11, 2019 •

edited

Loading

avbelov23 Jan 31, 2019 •

edited

Loading