Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions proxy/http/HttpTransact.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3641,12 +3641,12 @@ HttpTransact::handle_response_from_parent(State *s)
return;
}

if (s->current.retry_attempts.get() < s->txn_conf->parent_connect_attempts) {
if (s->current.retry_attempts.get() < (s->txn_conf->parent_connect_attempts - 1)) {
HTTP_INCREMENT_DYN_STAT(http_total_parent_retries_stat);
s->current.retry_attempts.increment(s->configured_connect_attempts_max_retries());
s->current.retry_attempts.increment();

// Are we done with this particular parent?
if ((s->current.retry_attempts.get() - 1) % s->txn_conf->per_parent_connect_attempts != 0) {
if (s->current.retry_attempts.get() % s->txn_conf->per_parent_connect_attempts != 0) {
// No we are not done with this parent so retry
HTTP_INCREMENT_DYN_STAT(http_total_parent_switches_stat);
s->next_action = how_to_open_connection(s);
Expand Down Expand Up @@ -3864,7 +3864,7 @@ HttpTransact::retry_server_connection_not_open(State *s, ServerState_t conn_stat
// disable keep-alive for request and retry //
//////////////////////////////////////////////
s->current.server->keep_alive = HTTP_NO_KEEPALIVE;
s->current.retry_attempts.increment(s->configured_connect_attempts_max_retries());
s->current.retry_attempts.increment();

TxnDebug("http_trans", "retry attempts now: %d, max: %d", s->current.retry_attempts.get(), max_retries);

Expand Down
15 changes: 12 additions & 3 deletions proxy/http/HttpTransact.h
Original file line number Diff line number Diff line change
Expand Up @@ -605,10 +605,9 @@ class HttpTransact
}

void
increment(MgmtInt configured_connect_attempts_max_retries)
increment()
{
++_v;
ink_assert(_v <= configured_connect_attempts_max_retries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't this assert apply now with the new changes? seems like increment doesn't have to change now that you check attempts - 1 before you increment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question. The check before the increment ensures that attempts - 1 < the global parent_max_retries. The problem with this assert is that the argument passed in configured_connect_attempts_max_retries may not be the global parent max retry, but the max retry count for the current parent. Currently, the argument is set to the the output of s->configured_connect_attempts_max_retries(), which is returning:

  • if connecting to the OS, returns the txn_conf->connect_attempts_max_retries
  • if connecting to a parent proxy, returns the max retries for the current parent(bounded by the global max retry for all parents)
    In the second case, the assert can fail as after the increment, it went over the max for the current parent(which should be allowed as we switch to the next parent).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but doesn't this line auto cur_parent_max_attempts = ((cur_tries + ppca - 1) / ppca) * ppca; take into account the max retries for the parent but also in relation to the total? even when we hit the max retries for the current parent, don't your new changes in configured_connect_attempts_max_retries() account for the next parent. each increment should be protected. im trying to find the edge case when this assert fails, could you provide an example

Copy link
Contributor

@ywkaras ywkaras May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert can be triggered if parent_connect_attempts > (connect_attempts_max_retries + 1). If you can find checking logic in the config code that enforces this, it would be safe to restore the assert.

This was the original version of increment():

      void
      increment()
      {
        ++_v;
        ink_assert(_v <= configured_connect_attempts_max_retries());
      }

This was CurrentInfo::configured_connect_attempts_max_retries():

inline MgmtInt
HttpTransact::CurrentInfo::Attempts::configured_connect_attempts_max_retries() const
{
  return reinterpret_cast<const HttpTransact::State *>(reinterpret_cast<const char *>(this) -
                                                       offsetof(HttpTransact::State, current.attempts))
    ->configured_connect_attempts_max_retries();
}

Even though this code was verified in Yahoo prod, and this sort of pointer arithmetic is how inheritance is implemented in C++, this code was rejected in review. I never would have put in the assert if I thought it would require adding a parameter to increment().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even when we hit the max retries for the current parent, don't your new changes in configured_connect_attempts_max_retries() account for the next parent?

configured_connect_attempts_max_retries() normally would be increment to the next multiple of ppca(accounts for the next parent), except for the case where the attempt/(retry + 1) is already a multiple of ppca(current parent hits max retries), in which case the calculation basically returns the same attempt/retry count as before.

im trying to find the edge case when this assert fails, could you provide an example

The above logic make sense when we call maximize() with the configured_connect_attempts_max_retries() to disable further retries, since we already hit the max retries for the current parent, we are not setting it to anything larger.

However, it will fail the assert in increment() with the configured_connect_attempts_max_retries(), since n = configured_connect_attempts_max_retries(n).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I misunderstood when the actual attempts_max_retries was being called

}

unsigned
Expand Down Expand Up @@ -944,7 +943,17 @@ class HttpTransact
MgmtInt
configured_connect_attempts_max_retries() const
{
return txn_conf->connect_attempts_max_retries;
if (dns_info.looking_up != ResolveInfo::PARENT_PROXY) {
return txn_conf->connect_attempts_max_retries;
}
// For parent proxy, return the maximum retry count for the current parent
// intead of the max retries for the whole parent group. The max retry
// count for the current parent is derived from rounding the current
// attempt up to next multiple of ppca.
auto ppca = txn_conf->per_parent_connect_attempts;
auto cur_tries = current.retry_attempts.get() + 1;
auto cur_parent_max_attempts = ((cur_tries + ppca - 1) / ppca) * ppca;
return std::min(cur_parent_max_attempts, txn_conf->parent_connect_attempts) - 1;
}

private:
Expand Down