Skip to content

Conversation

@zizhong
Copy link
Member

@zizhong zizhong commented Aug 5, 2017

This is a rework on PR#351.

Why do we want this feature? (statements below copied from the old PR)

We need a way to examine the request body without making a downstream request, this feature has many use cases including:
Ability to buffer the body and ensure a full post is received before committing downstream resources.
Ability to choose an origin based on request body
Ability to do request content filtering such as a WAF might provide before the origin is involved.

The main difference between the new PR and the old one is the way to achieve reading request body.
The old PR calls do_io_read() directly and duplicates a chunked handler, which is kind of implementing read part of HttpTunnel from scratch.
The new PR reuses HttpTunnel. This requires the HttpTunnel to support no consumer mode and correctly resume from it.

This PR includes ATS core changes, a C API and a C plugin.

I will add C++ API and C++ plugin example. I'll write some other plugins which make a better demonstration.
Locally I wrote several Python test cases and I'll try to convert it to autests.
And I'll try my best to keep each commits small and clear.

In summary, my current todo list:

  1. start an API review.
  2. add docs.
  3. add tests.
  4. add C++ API and C++ plugins.
  5. write another plugin to a better demonstration.

@bryancall @zwoop @jpeach @SolidWallOfCode @maskit @scw00 @oknet @bgaff Please review this if you'd like to. Any ideas would be appreciated.

@zizhong
Copy link
Member Author

zizhong commented Aug 5, 2017

TS_CONFIG_HTTP_REQUEST_BUFFER_ENABLED is added to ts_lua and InkAPITest. The build issue is gone.

@zizhong zizhong force-pushed the post_buffer branch 2 times, most recently from 98d73a4 to 5fc5c68 Compare August 7, 2017 20:13
@zizhong
Copy link
Member Author

zizhong commented Aug 7, 2017

@scw00 thanks for reminding me. Regression tests now should be okay.

@bryancall
Copy link
Contributor

Did you take a look at plugins/experimental/buffer_upload ? There is a plugins handles this.

In TSHttpTxnGetClientRequestBody you are doubling the memory foot print creating a copy of the buffer. This looks like a potential DOS.

@zwoop
Copy link
Contributor

zwoop commented Aug 9, 2017

I'm pretty concerned about this concept in general, it seems like as implemented, it'd be near trivial to kill any server that accepts a POST with this buffering enabled. You will consume 2x the memory that the user POSTed ( (for some amount of time), right?

Also, how is this different / better than the existing buffer_upload plugin?

@zizhong
Copy link
Member Author

zizhong commented Aug 18, 2017

I did some investigation on the buffer_upload plugin and which is better between it and this PR.


How does upload_buffer work?

upload_buffer calls TSHttpTxnIntercept() on TS_HTTP_PRE_REMAP_HOOK, which creates a PluginVC(PVC) connecting to a plugin created continuation contp. Then the HttpSM goes to setup_plugin_request_intercept() where we set up a fake server. Connecting to the fake server, contp is invoked on TS_EVENT_NET_ACCEPT. The buffer_upload plugin calls a do_io_read on the PVC. The plugin will get all the data from the client through PVC.
After the plugin gets all the data, it will call TSHttpConnect() to create a special PVC, named net_vc, which will inject a new HttpSM. Also, the plugin creates VIOs to both net_vc and p_vc. The net_vc will connect to the server and receive data back to the plugin. Then the internal SM is done.
Then p_vc drives our original SM to state_read_server_response_header, then the SM continues and ends. The plugin does cleanups.


How does this PR work?

At the beginning of HttpTransact::HandleRequest, ATS core injects a new state called SM_ACTION_WAIT_FOR_FULL_BODY. In this state, by having a producer-only HttpTunnel, ATS core gets all the data. And it will invoke any plugins on TS_HTTP_REQUEST_BUFFER_READ_COMPLETE_HOOK so they can read all the request body. After that, the SM continues with an HTTP tunnel with data already there.


The upload_buffer plugin is super cool. Its advantage is that it's a plugin so no ATS core changes are required.
The thing I don't like about it:
1. the logic is relatively complicated and challenging to maintain.
2. It involves an additional HttpSM and doubles VIOs, which can be harmful to performance and resources.
3. still need to change if other plugins don't want to read the request body from a file.

This PR tries to solve the issue that how to enable a plugin read the request body before opening connections to the server. The way to implement it is a bit "twisted" but I think upload_buffer is at least double the twistedness.

@bryancall @zwoop What do you guys think?

@oknet
Copy link
Member

oknet commented Aug 24, 2017

Currently the HttpTunnel can not decode the Chunked data into TransformVC.
The Request Transform Plugin still need to decode the chunked data by itself.

@zizhong Do you have any idea to handle the Chunked data in HttpTunnel ? Can we implement the decoding support depending on this PR ?

@oknet
Copy link
Member

oknet commented Aug 24, 2017

@zizhong There is a timeout problem.

client ----[high speed link]---> ATS -------[low speed link]------> OServer

  1. ATS received the full data from client within 1 second.
  2. then ATS send the full data to OServer more than 120 seconds
  3. client side timeout.
  4. Does the ATS should close the server session ?

@zizhong
Copy link
Member Author

zizhong commented Aug 24, 2017

@oknet Thanks for your comments. Let me look into them.

@zizhong
Copy link
Member Author

zizhong commented Aug 28, 2017

@bryancall @zwoop @oknet Before jumping into fixing you guys' comments, I want to make sure that we're on the same page that this RB is more acceptable than the solution of using upload_buffer to address the body buffering issue and slow-post issue. Otherwise, all the following works on this can be in vain.
I posted my analysis in this thread 10+ days ago. Please share your ideas with me on this. Thanks!
cc @sidhuagarwal

@oknet
Copy link
Member

oknet commented Aug 30, 2017

@zizhong

Do u have read the below code ?

source: proxy/http/HttpTunnel.cc
 964   // YTS Team, yamsat Plugin
 965   // Allocate and copy partial POST data to buffers. Check for the various parameters
 966   // including the maximum configured post data size
 967   if (p->alive && sm->t_state.method == HTTP_WKSIDX_POST && sm->enable_redirection && (p->vc_type == HT_HTTP_CLIENT)) {
 968     Debug("http_redirect", "[HttpTunnel::producer_run] client post: %" PRId64 " max size: %" PRId64 "",
 969           p->buffer_start->read_avail(), HttpConfig::m_master.post_copy_size);
 970 
 971     // (note that since we are not dechunking POST, this is the chunked size if chunked)
 972     if (p->buffer_start->read_avail() > HttpConfig::m_master.post_copy_size) {
 973       Debug("http_redirect", "[HttpTunnel::producer_handler] post exceeds buffer limit, buffer_avail=%" PRId64 " limit=%" PRId6     4 "",
 974             p->buffer_start->read_avail(), HttpConfig::m_master.post_copy_size);
 975       sm->enable_redirection = false;
 976     } else {
 977       // allocate post buffers with a new reader. The reader will be freed when p->read_buffer is freed
 978       allocate_redirect_postdata_buffers(p->read_buffer->clone_reader(p->buffer_start));
 979       copy_partial_post_data();
 980     }
 981   } // end of added logic for partial POST

These code implement a feature:

  • if the method is POST , ATS will buffer the POST content but no more than post_copy_size
  • if ATS got HTTP 301/302 from OServer, ATS will try to follow the redirect url and replay the buffered POST content to new url.

It is very similar to this PR.

@zizhong
Copy link
Member Author

zizhong commented Sep 1, 2017

@oknet Yep, you're right. They both have the post buffer functionality.
The main difference is this PR buffers the post body before opening the connection but the yamsat code buffers the post body along the first transmission.
So, using one way to get another seems not straightforward to me. I can see with this PR, the yamsat code could be clean up. But I can't figure out the way to implement this PR by reusing the yamsat code. We would still need a consume-only-mode tunnel, which makes reusing yamsat code doesn't benefit much.

Please feel free to share your idea about this with me. Thanks!

@zizhong
Copy link
Member Author

zizhong commented Sep 5, 2017

ping on this one! @bryancall @zwoop @oknet Your opinions matter a lot to this project!

@bryancall @zwoop @oknet Before jumping into fixing you guys' comments, I want to make sure that we're on the same page that this RB is more acceptable than the solution of using upload_buffer to address the body buffering issue and slow-post issue. Otherwise, all the following works on this can be in vain.
I posted my analysis in this thread 10+ days ago. Please share your ideas with me on this. Thanks!

@zizhong
Copy link
Member Author

zizhong commented Sep 12, 2017

Slow-post attack is a type of Slowloris from #1237

alloc_index = DEFAULT_REQUEST_BUFFER_SIZE_INDEX;
}
} else {
alloc_index = buffer_size_to_index(t_state.hdr_info.request_content_length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will block the iocore when a flood attack with a larger content-length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is also true for the original SM?

}
MIOBuffer *post_buffer = new_MIOBuffer(alloc_index);
IOBufferReader *buf_start = post_buffer->alloc_reader();
int64_t post_bytes = chunked ? INT64_MAX : t_state.hdr_info.request_content_length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A chunked request may include the content-length header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.4.4.p.5

Messages must not include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length must be ignored.

@zizhong
Copy link
Member Author

zizhong commented Oct 5, 2017

client ----[high speed link]---> ATS -------[low speed link]------> OServer
ATS received the full data from client within 1 second.
then ATS send the full data to OServer more than 120 seconds
client side timeout.
Does the ATS should close the server session ?

In that case, the request buffer is probably better not to be enabled. This functionality is intended to solve the scenario as below.

client ----[low speed link]---> ATS -------[high speed link]------> OServer

@zizhong
Copy link
Member Author

zizhong commented Oct 6, 2017

@bryancall @zwoop @oknet @SolidWallOfCode @maskit @scw00 I added test about how ATS handles slow post attacks. This is the main reason we're trying to push this functionality out. It's a huge security flaw now that we don't have it as a protection.

@oknet
Copy link
Member

oknet commented Nov 6, 2017

The postbuf is moved from HttpTunnel to HttpSM by #2766 .

So the next step may be:

  1. plugin can hook on a POST request.
  2. expose an API to modify the post_copy_size
  3. a new Hook when the limitation of post_copy_size is reached.
  • HttpTxnReenable(ERROR) to deny the request.
  • HttpTxnReenable(CONTINUE) to accept the request.

@zizhong
Copy link
Member Author

zizhong commented Nov 8, 2017

Thanks @oknet for your valuable comments! I'll look into them asap.
Meanwhile, @bryancall , what do you think about this approach? I don't want to continue to contribute unless we reach a conclusion that we need this.

@bryancall
Copy link
Contributor

@oknet Are you saying that the postbuf in the HttpSM could be reused for buffering before connecting to the origin?

@oknet
Copy link
Member

oknet commented Nov 15, 2017

@bryancall yes, because the postbuf is managed by HttpSM now.

@bryancall
Copy link
Contributor

@zizhong I would prefer it if you can use the postbuf in the HttpSM. For new APIs we need to have docs in the PR and an email will need to be set to de dev@ mailing list with this information:
https://cwiki.apache.org/confluence/display/TS/API+Review+Process

@zizhong
Copy link
Member Author

zizhong commented Nov 15, 2017

@bryancall @oknet Thanks! Will apply the postbuf.

@bryancall
Copy link
Contributor

[approve ci]

Copy link
Contributor

@zwoop zwoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not 100% sold on this, but I guess someone can just choose not to use it. The memory usage for this seems potentially very high.

proxy/InkAPI.cc Outdated
}

tsapi char *
TSHttpTxnGetClientRequestBody(TSHttpTxn txnp, int *len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be TSHttpTxnClientReqBodyGet() . 1) Get() at the end, and 2) bcall pointed out that we have a TSHttpTxnClientReqGet().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, zwoop.

@zizhong
Copy link
Member Author

zizhong commented Feb 23, 2018

updated.

  1. clang format fixed,
  2. naming issue fixed,
  3. autest issue fixed.

@zwoop @bryancall @scw00 @oknet @maskit please take a look again!
About the memory consuming, our use case is to read the body and redirect based on it. The cost is inevitable.

@bryancall
Copy link
Contributor

[approve ci]

proxy/InkAPI.cc Outdated
tsapi char *
TSHttpTxnGetClientRequestBody(TSHttpTxn txnp, int *len)
{
char *ret = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use nullptr instead of NULL

proxy/InkAPI.cc Outdated
}

tsapi char *
TSHttpTxnGetClientRequestBody(TSHttpTxn txnp, int *len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to use the API? You are read the post body, but what are you going to be doing with it, since you can't modify it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two use cases for us currently.

  1. preventing slow post attack, which doesn't require this API;
  2. redirecting based on post body, which uses this API to get the body.

I can see modifying the body is useful. If you think the functionally is necessary for this PR, I can work on it and add it here.

zwoop
zwoop previously approved these changes Mar 1, 2018
proxy/InkAPI.cc Outdated
}

tsapi char *
TSHttpTxnClientReqBodyGet(TSHttpTxn txnp, int *len)
Copy link
Contributor

@bryancall bryancall Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we it would be more flexible to return a TSIOBufferReader and do the copying of the bytes in the plugin instead of doing it here.

Using the external APIs in internal code means that either we should be exposing more of the underlying implementation or using the internal APIs directly.

The API could be redone to be something like:

TSIOBufferReader TSIOBufferReaderPostGet() {
  sdk_assert(sdk_sanity_check_txn(txnp) == TS_SUCCESS);
  HttpSM *sm                         = (HttpSM *)txnp;
  return (TSIOBufferReader)sm->get_postbuf_clone_reader();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sounds reasonable. Will change it.

@zizhong
Copy link
Member Author

zizhong commented Mar 2, 2018

@bryancall updated as you suggested

@bryancall
Copy link
Contributor

@zizhong Thanks for the update. Looks good to me. Can you shoot an email to dev@ on the updated API? If no one objects, I will merge it.

@zizhong
Copy link
Member Author

zizhong commented Mar 5, 2018

@bryancall Will do. Thanks!

bryancall
bryancall previously approved these changes Mar 6, 2018
@bryancall
Copy link
Contributor

[approve ci]

@bryancall
Copy link
Contributor

Please rename the API to TSHttpTxnPostBufferReaderGet as discussed on the mailing list.

@zizhong
Copy link
Member Author

zizhong commented Mar 8, 2018

@bryancall Thanks! updated.

@bryancall bryancall merged commit 88c74e4 into apache:master Mar 8, 2018
zizhong pushed a commit to zizhong/trafficserver that referenced this pull request Mar 14, 2018
This is bug introduced with PR apache#2335. The `if` is to avoid unnecessary
works when buffering the post body, but since redirects also need to
execute the logic flow and the request info may be changed during the
process of redirects. We need to extract the request info every time redirects
happen.
zizhong pushed a commit to zizhong/trafficserver that referenced this pull request Mar 14, 2018
This is a bug introduced with PR apache#2335. The `if` is to avoid unnecessary
works when buffering the post body, but since redirects also need to
execute the logic flow and the request info may be changed during the
process of redirects. We need to extract the request info every time redirects
happen.
@zizhong zizhong mentioned this pull request Mar 14, 2018
zwoop pushed a commit that referenced this pull request Mar 14, 2018
This is a bug introduced with PR #2335. The `if` is to avoid unnecessary
works when buffering the post body, but since redirects also need to
execute the logic flow and the request info may be changed during the
process of redirects. We need to extract the request info every time redirects
happen.
@zwoop zwoop added this to the Old milestone Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants