Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions doc/admin-guide/files/records.config.en.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,22 @@ Network
handled. This should be tuned according to your memory size, and expected
work load. If this is set to 0, the throttling logic is disabled.

.. ts:cv:: CONFIG proxy.config.net.max_connections_in INT 30000

The total number of client connections that the :program:`traffic_server`
can handle simultaneously. This should be tuned according to your memory size,
and expected work load (network, cpu etc). This limit includes both keepalive
and active client connections that :program:`traffic_server` can handle at
any given instant.

.. ts:cv:: CONFIG proxy.config.net.max_active_connections_in INT 10000

The total number of active client connections that the |TS| can handle
simultaneously. This should be tuned according to your memory size,
and expected work load (network, cpu etc). If this is set to 0, active
connection tracking is disabled and active connections have no separate
limit and the total connections follow `proxy.config.net.connections_throttle`

.. ts:cv:: CONFIG proxy.config.net.default_inactivity_timeout INT 86400
:reloadable:

Expand Down
11 changes: 10 additions & 1 deletion doc/admin-guide/monitoring/statistics/core/network-io.en.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,17 @@ Network I/O
.. ts:stat:: global proxy.process.net.connections_currently_open integer
:type: counter

.. ts:stat:: global proxy.process.net.connections_throttled_in integer
:type: counter

.. ts:stat:: global proxy.process.net.connections_throttled_out integer
:type: counter

.. ts:stat:: global proxy.process.net.max.active.connections_throttled_in integer
:type: counter

.. ts:stat:: global proxy.process.net.default_inactivity_timeout_applied integer
.. ts:stat:: global proxy.process.net.dynamic_keep_alive_timeout_in_count integer
.. ts:stat:: global proxy.process.net.default_inactivity_timeout_count integer
.. ts:stat:: global proxy.process.net.dynamic_keep_alive_timeout_in_total integer
.. ts:stat:: global proxy.process.net.inactivity_cop_lock_acquire_failure integer
.. ts:stat:: global proxy.process.net.net_handler_run integer
Expand Down
2 changes: 2 additions & 0 deletions iocore/net/Net.cc
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ register_net_stats()
(int)net_connections_throttled_in_stat, RecRawStatSyncSum);
RecRegisterRawStat(net_rsb, RECT_PROCESS, "proxy.process.net.connections_throttled_out", RECD_INT, RECP_PERSISTENT,
(int)net_connections_throttled_out_stat, RecRawStatSyncSum);
RecRegisterRawStat(net_rsb, RECT_PROCESS, "proxy.process.net.max.active.connections_throttled_in", RECD_INT, RECP_PERSISTENT,
(int)net_connections_max_active_throttled_in_stat, RecRawStatSyncSum);
}

void
Expand Down
1 change: 1 addition & 0 deletions iocore/net/P_Net.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ enum Net_Stats {
net_tcp_accept_stat,
net_connections_throttled_in_stat,
net_connections_throttled_out_stat,
net_connections_max_active_throttled_in_stat,
Net_Stat_Count
};

Expand Down
15 changes: 13 additions & 2 deletions iocore/net/UnixNet.cc
Original file line number Diff line number Diff line change
Expand Up @@ -562,6 +562,11 @@ NetHandler::manage_active_queue(bool ignore_queue_size = false)
max_connections_per_thread_in, max_connections_active_per_thread_in, total_connections_in, active_queue_size,
keep_alive_queue_size);

if (!max_connections_active_per_thread_in) {
// active queue has no max
return true;
}

if (ignore_queue_size == false && max_connections_active_per_thread_in > active_queue_size) {
return true;
}
Expand Down Expand Up @@ -721,16 +726,22 @@ NetHandler::add_to_active_queue(NetEvent *ne)
max_connections_per_thread_in, active_queue_size, keep_alive_queue_size);
ink_assert(mutex->thread_holding == this_ethread());

bool active_queue_full = false;

// if active queue is over size then close inactive connections
if (manage_active_queue() == false) {
// there is no room left in the queue
return false;
active_queue_full = true;
}

if (active_queue.in(ne)) {
// already in the active queue, move the head
active_queue.remove(ne);
} else {
if (active_queue_full) {
// there is no room left in the queue
NET_SUM_DYN_STAT(net_connections_max_active_throttled_in_stat, 1);
return false;
}
// in the keep-alive queue or no queue, new to this queue
remove_from_keep_alive_queue(ne);
++active_queue_size;
Expand Down
2 changes: 1 addition & 1 deletion proxy/PluginVC.cc
Original file line number Diff line number Diff line change
Expand Up @@ -926,7 +926,7 @@ bool
PluginVC::add_to_active_queue()
{
// do nothing
return false;
return true;
}

SOCKET
Expand Down
7 changes: 6 additions & 1 deletion proxy/http/Http1ClientSession.cc
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,12 @@ Http1ClientSession::new_transaction()
return;
}

if (!client_vc->add_to_active_queue()) {
// no room in the active queue close the connection
this->do_io_close();
return;
}

// Defensive programming, make sure nothing persists across
// connection re-use
half_close = false;
Expand All @@ -474,7 +480,6 @@ Http1ClientSession::new_transaction()
trans.set_proxy_ssn(this);
transact_count++;

client_vc->add_to_active_queue();
trans.new_transaction(read_from_early_data > 0 ? true : false);
}

Expand Down
8 changes: 7 additions & 1 deletion proxy/http2/Http2ConnectionState.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1153,6 +1153,13 @@ Http2ConnectionState::state_closed(int event, void *edata)
Http2Stream *
Http2ConnectionState::create_stream(Http2StreamId new_id, Http2Error &error)
{
// first check if we've hit the active connection limit
if (!ua_session->get_netvc()->add_to_active_queue()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I thought this would be a problem because of multiplexing is HTTP/2 the session may already be on the active queue, but looking at add_to_active_queue, it already deals with that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course in the case of HTTP/2 it may be better if add_to_active queue does not call do_io_close and instead let the error handling in the HTTP/2 logic send a useful error and close the connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And looking again, that is exactly what the logic does, so this looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shinrich

Yeah, I tweaked add_to_active_queue() to handle the case where the session is already inside the active queue just for the HTTP/2 scenario.

One thing I'm wondering about was whether the throttling should be done with HTTP/2 error HTTP2_ERROR_REFUSED_STREAM instead of a generic "internal error". HTTP2_ERROR_REFUSED_STREAM will allow the client to retry the request safely. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP2_ERROR_REFUSED_STREAM (with HTTP2_ERROR_CLASS_STREAM) would make more sense. It might cause rapid retries, but in that case the session would be closed with ENHANCE_YOUR_CALM eventually because of an excessive error rate.

If we don't want retries, I'd use NO_ERROR instead because it's a managed situation but not something unexpected (e.g. memory allocation failure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly my thought as well. If one or few boxes in a cluster are overloaded, a retry from the client is desirable (as it may then end up on a new box). On the other hand, if the entire cluster is overloaded, we don't want to create a retry storm and make matters worse, by allowing a fast retry. I'll change it to NO_ERROR as this is intended/designed failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use HTTP2_ERROR_CLASS_NONE. It's something like null. Valid options to close a connection/stream are HTTP2_ERROR_CLASS_CONNECTION or HTTP2_ERROR_CLASS_STREAM.

For this case, these two are possible options, IMO.
HTTP2_ERROR_CLASS_CONNECTION + NO_ERROR : Just close the connection
HTTP2_ERROR_CLASS_STREAM + REFUSED_STREAM : Close the stream but allows a client to retry on the same connection

NO_ERROR does not ensure that clients don't retry. If we just close a connection with HTTP2_ERROR_CLASS_CONNECTION + NO_ERROR, client may reconnect to a server because the client don't now why the connection was closed. It's basically the same as HTTP/1.1 behavior.

Reasons I suggested REFUSED_STREAM are 1) it's relatively lighter than retry on a new connection, and 2) it would end up with ENHANCE_YOUR_CALM, which suggests the behavior is generating excessive load, if retries seem like making things worse. There is still no guarantee but I hope the error code ceases retries.

Another option is retuning 503, (and then HTTP2_ERROR_CLASS_CONNECTION + NO_ERROR). Most clients would not retry soon if they received it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s not correct. Most clients will retry as soon as they see a 503 including automatically at the Network layer. Infact, 502/503 are considered the safest status codes for the clients to automatically retry.

I’ll just change it back to return (connection class + no error). I don’t think this is a big deal and the only way to prevent a retry would be If there was a custom protocol between server and client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bummer if clients ignore Retry-After header sent with 503.

error = Http2Error(Http2ErrorClass::HTTP2_ERROR_CLASS_CONNECTION, Http2ErrorCode::HTTP2_ERROR_NO_ERROR,
"refused to create new stream, maxed out active connections");
return nullptr;
}

// In half_close state, TS doesn't create new stream. Because GOAWAY frame is sent to client
if (ua_session->get_half_close_local_flag()) {
error = Http2Error(Http2ErrorClass::HTTP2_ERROR_CLASS_STREAM, Http2ErrorCode::HTTP2_ERROR_REFUSED_STREAM,
Expand Down Expand Up @@ -1226,7 +1233,6 @@ Http2ConnectionState::create_stream(Http2StreamId new_id, Http2Error &error)
new_stream->mutex = new_ProxyMutex();
new_stream->is_first_transaction_flag = get_stream_requests() == 0;
increment_stream_requests();
ua_session->get_netvc()->add_to_active_queue();

return new_stream;
}
Expand Down