Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #731: HTTP tables introduction. #1017

Merged
merged 11 commits into from
Jun 9, 2018
Merged

Fix #731: HTTP tables introduction. #1017

merged 11 commits into from
Jun 9, 2018

Conversation

aleksostapenko
Copy link
Contributor

No description provided.

TFW_INC_STAT_BH(clnt.msgs_otherr);
tfw_client_drop(req, 500, "cannot find vhost for request");
}
return -EINVAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say, the function return value meaning is pass or filter the request, so it should return TFW_BLOCK or TFW_PASS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

return -EINVAL;

if (!check_identifier(name_src, name_len))
if (!rule) {
e->name = e->ftoken;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e->name is already set to e->ftoken if !rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e->ftoken is temporally assigned to local name variable (not to e->name) for uniform checking of identifier.

@@ -193,6 +202,7 @@ entry_reset(TfwCfgEntry *e)
BUG_ON(!e);

kfree(e->name);
Copy link
Contributor

@vankoven vankoven May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that e->name may point to TFW_CFG_RULE_NAME constant string so it mustn't be freed. Ok here, never happens actually.

* ...
* mark != 1 -> storage;
* }
* http_chain {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration has changed, please, update etc/tempesta_fw.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return match_fn(req, rule);
matched = match_fn(req, rule);
if (rule->inv)
matched = !matched;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as matched = match_fn(req, rule) ^ rule->inv, but may be your variant is more easy to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bit operations are always preferred to conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

*/
static TfwVhost *
tfw_sched_http_sched_vhost(TfwMsg *msg)
tfw_sched_http_sched_vhost(TfwMsg *msg, bool *block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me no target vhost found means that request must be filtered. In current http_chain definition block is defined as default action. So rule block must be used if no target vhost found and variable vhost must be set to true. That means, that block variable duplicates return value. But the function works different: it doesn't set variable block to true, if no vhost found and no explicit block action encountered.

I see, why you've introduced block variable. Request without target vhost can be filtered in two ways: blocking of entire client connection or dropping a single request. I believe, HTTP scheduler is wrong place to make a that decision, it must be done in http.c. But let's look to the problem. Some clients may use proxies to reach TempestaFW, one client connection can multiplicate a lot of end clients. Some of them malicious, but most of them - not. We don't want to block good clients for others deeds, so dropping of single requests seems more reasonable for me. We can still block connections with lot of malicious requests using http_resp_code_block frang limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a request with not found vhost is dropped in tfw_http_req_set_context() with cannot find vhost for request and I think it has sense: a vhost isn't configured for the request - maybe it's just an error, not an attack. Anyway we do not pass the request (which also could be malicious) to a backend, so we're good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: connection dropped in case of block action; request dropped in case of vhost miss.

@@ -2596,12 +2600,23 @@ tfw_http_req_add_seq_queue(TfwHttpReq *req)
static int
tfw_http_req_set_context(TfwHttpReq *req)
{
if (!(req->vhost = tfw_vhost_match((TfwMsg *)req)))
return -EINVAL;
bool block = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment for tfw_sched_http_sched_vhost()

@@ -60,7 +60,7 @@ tfw_sched_get_vhost(TfwMsg *msg)
list_for_each_entry_rcu(sched, &sched_list, list) {
if (!sched->sched_vhost)
break;
if ((vhost = sched->sched_vhost(msg))) {
if ((vhost = sched->sched_vhost(msg, block))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was said in one of the last reviews by @krizhanovsky : http (vhost) scheduler isn't one of possible schedulers and it's API is far away of hash and rr schedulers. Foreach cycle must be replaced with direct http (vhost) scheduler call. This wasn't done in this PR, please add issue for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need for the loop, even probably the whole tfw_sched_get_vhost() should be just replaced by direct http tables call, especially given that tempesta_fw/sched/tfw_sched_http.c should be moved & renamed to tempesta_fw/http_tbl.c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -5,7 +5,8 @@
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just a scheduler any more, so please move it to tempesta_fw directory and rename to http_tbl.c, it should be built into tempesta_fw.ko. Also please update tempesta.sh to remove the module management. Preferably, move tfw_sched_ratio.c and tfw_sched_hash.c to tempesta_fw as http_sched_ratio.c and http_sched_hash.c correspondingly as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -60,7 +60,7 @@ tfw_sched_get_vhost(TfwMsg *msg)
list_for_each_entry_rcu(sched, &sched_list, list) {
if (!sched->sched_vhost)
break;
if ((vhost = sched->sched_vhost(msg))) {
if ((vhost = sched->sched_vhost(msg, block))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need for the loop, even probably the whole tfw_sched_get_vhost() should be just replaced by direct http tables call, especially given that tempesta_fw/sched/tfw_sched_http.c should be moved & renamed to tempesta_fw/http_tbl.c

* etc...
* Each such match_*() function takes TfwHttpReq and TfwHttpMatchRule and
* returns true if the given request matches to the given rule.
* Such approach allows to keep the code structured and eases adding new
* @field/@op combinations.
* @field types.
* Currently that is implemented with a multi-dimensional array of pointers
* (the match_fn_tbl). However the code is critical for performance, so perhaps
* this may be optimized to a kind of jump table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update copyrights year at the below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

BUG_ON(rule->op <= 0 || rule->op >= _TFW_HTTP_MATCH_O_COUNT);
BUG_ON(rule->act.type <= 0 || rule->act.type >= _TFW_HTTP_MATCH_ACT_COUNT);
BUG_ON(rule->arg.type <= 0 || rule->arg.type >= _TFW_HTTP_MATCH_A_COUNT);
BUG_ON(rule->arg.len < 0 || rule->arg.len >= TFW_HTTP_MATCH_MAX_ARG_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too long lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

return match_fn(req, rule);
matched = match_fn(req, rule);
if (rule->inv)
matched = !matched;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bit operations are always preferred to conditions.

tfw_http_match_op_t op; /* Comparison operator. */
TfwHttpAction act; /* Rule action. */
TfwHttpMatchArg arg; /* A value to be compared with the field.
note: the @arg has variable length. */
} TfwHttpMatchRule;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please arrange members of this structure and the nested structures to minimize memory gaps (e.g. integer enum types should be adjacent where possible, probably it has sense to replace the bool fields with integer flags).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

*/
static TfwVhost *
tfw_sched_http_sched_vhost(TfwMsg *msg)
tfw_sched_http_sched_vhost(TfwMsg *msg, bool *block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a request with not found vhost is dropped in tfw_http_req_set_context() with cannot find vhost for request and I think it has sense: a vhost isn't configured for the request - maybe it's just an error, not an attack. Anyway we do not pass the request (which also could be malicious) to a backend, so we're good.

return r;
} else {
BUG();
// TODO: parsing rules of other types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use C++-style for persistent comments.

Why don't you parse the other types?

Copy link
Contributor Author

@aleksostapenko aleksostapenko Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected: processing for method type has been added.
Addr type is not used in TempestaFW and not implemented at all.

@@ -37,6 +57,7 @@ typedef enum {
TFW_HTTP_MATCH_F_HOST,
TFW_HTTP_MATCH_F_METHOD,
TFW_HTTP_MATCH_F_URI,
TFW_HTTP_MATCH_F_MARK,
_TFW_HTTP_MATCH_F_COUNT
} tfw_http_match_fld_t;
Copy link
Contributor

@krizhanovsky krizhanovsky May 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from #731

Currently not all special headers are handled in tfw_http_match_fld_t, e.g. it doesn't handle X-Forwarded-For, User-Agent or Cookie.

Isn't addressed. Please also update the Wiki after implementing handling of the headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

smp_processor_id(), __func__,
skb, skb->data_len, skb->len);
/* Propagate mark of message head skb.*/
skb->mark = mark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend #883 by a requirement to test the mark iteroperability between Tempesta FW and iptables/nftables in both the directions (or just write the test on your own). Also add an example with iptables/nftables to https://github.com/tempesta-tech/tempesta/wiki/HTTP-tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiki has been updated with iptables example.

@@ -259,8 +259,18 @@
# The value of the "Host" header field.
# - hdr_conn
# The value of the "Connection" header field.
# - hdr_ctype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the issue of this PR, but it seems that all that hdr_ctype, hdr_agent and Co is not very user friendly. In the same time we have hdr_raw. It would better to provide single directive for all headers:

http_chain NAME {
	[ hdr "HDR_NAME" == (!=) ARG ] -> ACTION [ = VAL];
	...
}

Like it's done for hdr_raw now. And give a notice that comparing of the Content-Type, User-Agent, .... headers is accelerated, while processing of other headers is slow.

We have tfw_http_msg_req_spec_hid() function which can convert header name to it's internal representation in spec headers. So i believe that the change would be easy to implement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created issue #1033

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good after minor fixes

TFW_INC_STAT_BH(clnt.msgs_filtout);
tfw_client_block(req, 403, "parsed request"
" has been filtered out");
tfw_client_block(req, 500, "request has been filtered"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer 403 Forbidden here. It's not server or Tempesta fault that client has requested forbidden location. Think of it from http_resp_code_block directive perspective. It worth to block clients that cause many 403 errors, while legitimate users have margin for mistake,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

if ((req->vhost = tfw_http_tbl_vhost((TfwMsg *)req, &block))) {
req->location = tfw_location_match(req->vhost,
&req->uri_path);
r = tfw_gfsm_move(&conn->state, TFW_HTTP_FSM_REQ_MSG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook is not called if vhost is not found. That is not right. Need to run the hook even if vhost is not found, default vhost and default location will be used as context.

Currently only frang is behind that hook, it still may want to block the client even if no vhost found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch - this must be fixed before the merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.


*op_out = TFW_HTTP_MATCH_O_EQ;
if (len > 1 && arg[len - 1] == '*' && arg[len - 2] != '\\')
*op_out = TFW_HTTP_MATCH_O_PREFIX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something happened with identations at this line and line 564.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@@ -259,8 +259,18 @@
# The value of the "Host" header field.
# - hdr_conn
# The value of the "Connection" header field.
# - hdr_ctype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created issue #1033

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge after small fixes and cleanups.

if ((req->vhost = tfw_http_tbl_vhost((TfwMsg *)req, &block))) {
req->location = tfw_location_match(req->vhost,
&req->uri_path);
r = tfw_gfsm_move(&conn->state, TFW_HTTP_FSM_REQ_MSG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch - this must be fixed before the merge.

@aleksostapenko aleksostapenko merged commit 333ec1c into master Jun 9, 2018
@aleksostapenko aleksostapenko deleted the ao-731 branch June 9, 2018 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants