Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions configs/body_factory/default/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dist_bodyfactory_DATA = \
connect\#dns_failed \
connect\#failed_connect \
connect\#hangup \
connect\#all_dead \
default \
interception\#no_host \
README \
Expand Down
17 changes: 17 additions & 0 deletions configs/body_factory/default/connect#all_dead
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<HTML>
<HEAD>
<TITLE>No Valid Host</TITLE>
</HEAD>

<BODY BGCOLOR="white" FGCOLOR="black">
<H1>No Valid Host</H1>
<HR>

<FONT FACE="Helvetica,Arial"><B>
Description: Unable to find a valid target host.

The server was found but all of the addresses are marked dead and so there is
no valid target address to which to connect. Please try again after a few minutes.
</B></FONT>
<HR>
</BODY>
3 changes: 3 additions & 0 deletions doc/developer-guide/core-architecture/HostDB-Data-Layout.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
191 changes: 191 additions & 0 deletions doc/developer-guide/core-architecture/hostdb.en.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

.. include:: ../../common.defs

.. highlight:: cpp
.. default-domain:: cpp

.. _developer-doc-hostdb:

HostDB
******

HostDB is a cache of DNS results. It is used to increase performance by aggregating address
resolution across transactions. HostDB also stores state information for specific IP addresses.

Operation
=========

The primary operation for HostDB is to resolve a fully qualified domain name ("FQDN"). As noted each
FQDN is associated with a single record. Each record has an array of items. When a resolution
request is made the database is checked to see if the record is already present. If so, it is
served. Otherwise a DNS request is made. When the nameserver replies a record is created, added
to the database, and then returned to the requestor.

Each info tracks several status values for its corresponding upstream. These are

* HTTP version
* Last failure time

The HTTP version is tracked from responses and provides a mechanism to make intelligent guesses
about the protocol to use to the upstream.

The last failure time tracks when the last connection failure to the info occurred and doubles as
a flag, where a value of ``TS_TIME_ZERO`` indicates a live target and any other value indicates a
dead info.

If an info is marked dead (has a non-zero last failure time) there is a "fail window" during which
no connections are permitted. After this time the info is considered to be a "zombie". If all infos
for a record are dead then a specific error message is generated (body factory tag
"connect#all_dead"). Otherwise if the selected info is a zombie, a request is permitted but the
zombie is immediately marked dead again, preventing any additional requests until either the fail
window has passed or the single connection succeeds. A successful connection clears the last file
time and the info becomes alive.

Runtime Structure
=================

DNS results are stored in a global hash table as instances of ``HostDBRecord``. Each record stores
the results of a single query. These records are not updated with new DNS results - instead a new
record instance is created and replaces the previous instance in the table. The records are
reference counted so such a replacement doesn't invalidate the old record if the latter is still
being accessed. Some specific dynamic data is migrated from the old record to the new one, such as
the failure status of the upstreams in the record.

In each record is a variable length array of items, instances of ``HostDBInfo``, one for each
IP address in the record. This is called the "round robin" data for historical reasons. For SRV
records there is an additional storage area in the record that is used to store the SRV names.

.. figure:: HostDB-Data-Layout.svg

The round robin data is accessed by using an offset and count in the base record. For SRV records
each record has an offset, relative to that ``HostDBInfo`` instance, for its own name in the name
storage area.

State information for the outbound connection has been moved to a refurbished ``DNSInfo`` class
named ``ResolveInfo``. As much as possible relevant state information has been moved from the
``HttpSM`` to this structure. This is intended for future work where the state machine deals only
with upstream transactions and not sessions.

``ResolveInfo`` may contain a reference to a HostDB record, which preserves the record even if it is
replaced due to DNS queries in other transactions. The record is not required as the resolution
information can be supplied directly without DNS or HostDB, e.g. a plugin sets the upstream address
explicitly. The ``resolved_p`` flag indicates if the current information is valid and ready to be
used or not. A result of this is there is no longer a specific holder for API provided addresses -
the interface now puts the address in the ``ResolveInfo`` and marks it as resolved. This prevents
further DNS / HostDB lookups and the address is used as is.

The upstream port is a bit tricky and should be cleaned up. Currently value in ``srv_port``
determines the port if set. If not, then the port in ``addr`` is used.

Resolution Style
----------------

.. cpp:enum:: OS_Addr

Metadata about the source of the resolved address.'

.. cpp:enumerator:: TRY_DEFAULT

Use default resolution. This is the initial state.

.. cpp:enumerator:: TRY_HOSTDB

Use HostDB to resolve the target key.

.. cpp:enumerator:: TRY_CLIENT

Use the client supplied target address. This is used for transparent connections - the upstream
address is obtained from the inbound connection. May fail over to HostDB.

.. cpp:enumerator:: USE_HOSTDB

Use HostDB to resolve the target key.

.. cpp:enumerator:: USE_CLIENT

Use the client supplied target address.

.. cpp:enumerator:: USE_API

Use the address provided via the plugin API.

The parallel values for using HostDB and the client target address are to control fail over on
connection failure. The ``TRY_`` values can fail over to another style, but the ``USE_`` values
cannot. This prevents cycles of style changes by having any ``TRY_`` value fail over to a
``USE_`` value, at which point it can no longer change. Note there is no ``TRY_API`` - if a
plugin sets the upstream address that is locked in.

Issues
======

Currently if an upstream is marked down connections are still permitted, the only change is the
number of retries. This has caused operational problems where dead systems are flooded with requests
which, despite the timeouts, accumulate in ATS until ATS runs out of memory (there were instances of
over 800K pending transactions). This also made it hard to bring the upstreams back online. With
these changes requests to dead upstreams are strongly rate limited and other transactions are
immediately terminated with a 502 response, protecting both the upstream and ATS.

Future
======

There is still some work to be done in future PRs.

* The fail window and the zombie window should be separate values. It is quite reasonable to want
to configure a very short fail window (possibly 0) with a moderately long zombie window so that
probing connections can immediately start going upstream at a low rate.

* Failing an upstream should be more loosely connected to transactions. Currently there is a one
to one relationship where failure is defined as the failure of a specific transaction to connect.
There are situations where the number of connections attempts for mark a failure is should be
larger than the number of retries for a single transaction. For transiently busy upstreams and
low latency requests it can be reasonable to tune the per transaction timeout low with no retries
but this then risks marking down upstreams that were merely a bit slow at a given moment.

* Parallel DNS requests should be supported. This is for both cross family requests and for split
DNS.

* It would be nice to be able to do the probing connections to an upstream using synthetic requests
instead of burning actual user requests. What would be needed is a handoff from ATS to the probe
to indicate a particular upstream is considered down, at which point active health checks are done
until the upstream is once again alive, at which point this is handed off back to ATS.

History
=======

This version has several major architectural changes from the previous version.

* The data is split into records and info, not handled as a variant of a single data type. This
provides a noticeable simplification of the code.

* Single and multiple address results are treated identically - a singleton is simply a multiple
of size 1. This yeilds a major simplification of the implementation.

* Connections are throttled to dead upstreams, allowing only a single connection attempt per fail
window timing until a connection succeeds.

* Timing information is stored in ``std::chrono`` data types instead of proprietary types.

* State information has been promoted to atomics and updates are immediate rather than scheduled.
This also means the data in the state machine is a reference to a shared object, not a local copy.
The promotion was necessary to coordinate zombie connections to dead upstreams across transactions.

* The "resolve key" is now a separate data object from the HTTP request. This is a subtle but
major change. The effect is requests can be routed to different upstreams without changing
the request. Parent selection can be greatly simplified as it become merely a matter of setting
the resolve key, rather than having a completely different code path.
1 change: 1 addition & 0 deletions doc/developer-guide/core-architecture/index.en.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,6 @@ Core Architecture
:maxdepth: 1

heap.en
hostdb.en
rpc.en
url_rewrite_architecture.en.rst
24 changes: 24 additions & 0 deletions doc/uml/host-resolve.plantuml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
' SPDX-License-Identifier: Apache-2.0
' Licensed under the Apache License, Version 2.0 (the "License");
' you may not use this file except in compliance with the License.
' You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
' Unless required by applicable law or agreed to in writing, software distributed under the License is distributed
' on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
' See the License for the specific language governing permissions and limitations under the License.

@startuml

hide empty description

state HttpSM {
state do_http_server_open {
}
}

state HandleRequest #cyan
state CallOSDNSLookup #cyan

CallOSDNSLookup -> OSDNSLookup

@enduml

12 changes: 5 additions & 7 deletions example/plugins/c-api/protocol/TxnSM.c
Original file line number Diff line number Diff line change
Expand Up @@ -477,8 +477,6 @@ int
state_dns_lookup(TSCont contp, TSEvent event, TSHostLookupResult host_info)
{
TxnSM *txn_sm = (TxnSM *)TSContDataGet(contp);
struct sockaddr const *q_server_addr;
struct sockaddr_in ip_addr;

TSDebug(PLUGIN_NAME, "enter state_dns_lookup");

Expand All @@ -489,16 +487,16 @@ state_dns_lookup(TSCont contp, TSEvent event, TSHostLookupResult host_info)
txn_sm->q_pending_action = NULL;

/* Get the server IP from data structure TSHostLookupResult. */
q_server_addr = TSHostLookupResultAddrGet(host_info);
struct sockaddr const *sa = TSHostLookupResultAddrGet(host_info);

/* Connect to the server using its IP. */
set_handler(txn_sm->q_current_handler, (TxnSMHandler)&state_connect_to_server);
TSAssert(txn_sm->q_pending_action == NULL);
TSAssert(q_server_addr->sa_family == AF_INET); /* NO IPv6 in this plugin */
TSAssert(sa->sa_family == AF_INET); /* NO IPv6 in this plugin */
struct sockaddr_in *addr = (struct sockaddr_in *)(sa);

memcpy(&ip_addr, q_server_addr, sizeof(ip_addr));
ip_addr.sin_port = txn_sm->q_server_port;
txn_sm->q_pending_action = TSNetConnect(contp, (struct sockaddr const *)&ip_addr);
addr->sin_port = txn_sm->q_server_port;
txn_sm->q_pending_action = TSNetConnect(contp, sa);

return TS_SUCCESS;
}
Expand Down
6 changes: 6 additions & 0 deletions include/ts/ts.h
Original file line number Diff line number Diff line change
Expand Up @@ -1950,7 +1950,13 @@ tsapi TSReturnCode TSPortDescriptorAccept(TSPortDescriptor, TSCont);
/* --------------------------------------------------------------------------
DNS Lookups */
tsapi TSAction TSHostLookup(TSCont contp, const char *hostname, size_t namelen);
/** Retrieve an address from the host lookup.
*
* @param lookup_result Result handle passed to event callback.
* @return A @c sockaddr with the address if successful, a @c nullptr if not.
*/
tsapi struct sockaddr const *TSHostLookupResultAddrGet(TSHostLookupResult lookup_result);

/* TODO: Eventually, we might want something like this as well, but it requires
support for building the HostDBInfo struct:
tsapi void TSHostLookupResultSet(TSHttpTxn txnp, TSHostLookupResult result);
Expand Down
4 changes: 2 additions & 2 deletions include/tscore/BufferWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -854,10 +854,10 @@ std::string &
bwprintv(std::string &s, ts::TextView fmt, std::tuple<Args...> const &args)
{
auto len = s.size(); // remember initial size
size_t n = ts::FixedBufferWriter(const_cast<char *>(s.data()), s.size()).printv(fmt, std::move(args)).extent();
size_t n = ts::FixedBufferWriter(const_cast<char *>(s.data()), s.size()).printv(fmt, args).extent();
s.resize(n); // always need to resize - if shorter, must clip pre-existing text.
if (n > len) { // dropped data, try again.
ts::FixedBufferWriter(const_cast<char *>(s.data()), s.size()).printv(fmt, std::move(args));
ts::FixedBufferWriter(const_cast<char *>(s.data()), s.size()).printv(fmt, args);
}
return s;
}
Expand Down
7 changes: 7 additions & 0 deletions include/tscore/BufferWriterForward.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,4 +148,11 @@ class BWFormat;

class BufferWriter;

/// Storage for debug messages.
/// If @c bwprint is used with this, the storage is reused which minimizes allocations.
/// E.g.
/// @code

inline thread_local std::string bw_dbg;

} // namespace ts
11 changes: 11 additions & 0 deletions include/tscore/Diags.h
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,17 @@ is_dbg_ctl_enabled(DbgCtl const &ctl)
} \
} while (false)

// A BufferWriter version of Debug().
#define Debug_bw(tag__, fmt, ...) \
do { \
if (unlikely(diags()->on())) { \
static DbgCtl Debug_bw_ctl(tag__); \
if (Debug_bw_ctl.ptr()->on) { \
DbgPrint(Debug_bw_ctl, "%s", ts::bwprint(ts::bw_dbg, fmt, __VA_ARGS__).c_str()); \
} \
} \
} while (false)

// printf-like debug output. First parameter must be tag (C-string literal, or otherwise
// a constexpr returning char const pointer to null-terminated C-string).
//
Expand Down
16 changes: 15 additions & 1 deletion include/tscore/bwf_std_format.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <atomic>
#include <array>
#include <string_view>
#include <chrono>
#include "tscpp/util/TextView.h"
#include "tscore/BufferWriterForward.h"

Expand All @@ -38,6 +39,20 @@ bwformat(ts::BufferWriter &w, ts::BWFSpec const &spec, atomic<T> const &v)
return ts::bwformat(w, spec, v.load());
}

template <typename Rep, typename Period>
ts::BufferWriter &
bwformat(ts::BufferWriter &w, ts::BWFSpec const &spec, chrono::duration<Rep, Period> const &d)
{
return bwformat(w, spec, d.count());
}

template <typename Clock, typename Duration>
ts::BufferWriter &
bwformat(ts::BufferWriter &w, ts::BWFSpec const &spec, chrono::time_point<Clock, Duration> const &t)
{
return bwformat(w, spec, t.time_since_epoch());
}

} // end namespace std

namespace ts
Expand Down Expand Up @@ -130,5 +145,4 @@ namespace bwf
BufferWriter &bwformat(BufferWriter &w, BWFSpec const &spec, bwf::Errno const &e);
BufferWriter &bwformat(BufferWriter &w, BWFSpec const &spec, bwf::Date const &date);
BufferWriter &bwformat(BufferWriter &w, BWFSpec const &spec, bwf::OptionalAffix const &opts);

} // namespace ts
1 change: 1 addition & 0 deletions include/tscore/ink_time.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ using ts_hr_time = ts_hr_clock::time_point;

using ts_seconds = std::chrono::seconds;
using ts_milliseconds = std::chrono::milliseconds;
using ts_nanoseconds = std::chrono::nanoseconds;

/// Equivalent of 0 for @c ts_time. This should be used as the default initializer.
static constexpr ts_time TS_TIME_ZERO;
Expand Down
7 changes: 7 additions & 0 deletions include/tscore/ts_file.h
Original file line number Diff line number Diff line change
Expand Up @@ -329,5 +329,12 @@ namespace file

/* ------------------------------------------------------------------- */
} // namespace file

inline BufferWriter &
bwformat(BufferWriter &w, BWFSpec const &spec, file::path const &path)
{
return bwformat(w, spec, path.string());
}

} // namespace ts
/* ------------------------------------------------------------------- */
Loading