Skip to content

Commit

Permalink
stats: Add fake symbol table as an intermediate state to move to Symb…
Browse files Browse the repository at this point in the history
…olTable API without taking locks. (envoyproxy#5414)

Adds an abstract interface for SymbolTable and alternate implementation FakeSymbolTableImpl, which doesn't take locks. Once all stat tokens are symbolized at construction time, this FakeSymbolTable implementation can be deleted, and real-symbol tables can be used, thereby reducing memory and improving stat construction time per envoyproxy#3585 and envoyproxy#4980 . Note that it is not necessary to pre-allocate all elaborated stat names because multiple StatNames can be joined together without taking locks, even in SymbolTableImpl.

This implementation simply stores the characters directly in the uint8_t[] that backs each StatName, so there is no sharing or memory savings, but also no state associated with the SymbolTable, and thus no locks needed.

Risk Level: low
Testing: //test/common/stats/...

Signed-off-by: Joshua Marantz <jmarantz@google.com>
  • Loading branch information
jmarantz authored and danzh1989 committed Jan 31, 2019
1 parent 985ef23 commit adb04ac
Show file tree
Hide file tree
Showing 8 changed files with 674 additions and 288 deletions.
159 changes: 156 additions & 3 deletions include/envoy/stats/symbol_table.h
Original file line number Diff line number Diff line change
@@ -1,13 +1,166 @@
#pragma once

#include <memory>
#include <vector>

#include "envoy/common/pure.h"

#include "absl/strings/string_view.h"

namespace Envoy {
namespace Stats {

// Interface for referencing a stat name.
/**
* Runtime representation of an encoded stat name. This is predeclared only in
* the interface without abstract methods, because (a) the underlying class
* representation is common to both implementations of SymbolTable, and (b)
* we do not want or need the overhead of a vptr per StatName. The common
* declaration for StatName is in source/common/stats/symbol_table_impl.h
*/
class StatName;

// Interface for managing symbol tables.
class SymbolTable;
/**
* Intermediate representation for a stat-name. This helps store multiple names
* in a single packed allocation. First we encode each desired name, then sum
* their sizes for the single packed allocation. This is used to store
* MetricImpl's tags and tagExtractedName. Like StatName, we don't want to pay
* a vptr overhead per object, and the representation is shared between the
* SymbolTable implementations, so this is just a pre-declare.
*/
class SymbolEncoding;

/**
* SymbolTable manages a namespace optimized for stat names, exploiting their
* typical composition from "."-separated tokens, with a significant overlap
* between the tokens. The interface is designed to balance optimal storage
* at scale with hiding details from users. We seek to provide the most abstract
* interface possible that avoids adding per-stat overhead or taking locks in
* the hot path.
*/
class SymbolTable {
public:
/**
* Efficient byte-encoded storage of an array of tokens. The most common
* tokens are typically < 127, and are represented directly. tokens >= 128
* spill into the next byte, allowing for tokens of arbitrary numeric value to
* be stored. As long as the most common tokens are low-valued, the
* representation is space-efficient. This scheme is similar to UTF-8. The
* token ordering is dependent on the order in which stat-names are encoded
* into the SymbolTable, which will not be optimal, but in practice appears
* to be pretty good.
*
* This is exposed in the interface for the benefit of join(), which which is
* used in the hot-path to append two stat-names into a temp without taking
* locks. This is used then in thread-local cache lookup, so that once warm,
* no locks are taken when looking up stats.
*/
using Storage = uint8_t[];
using StoragePtr = std::unique_ptr<Storage>;

virtual ~SymbolTable() = default;

/**
* Encodes a stat name using the symbol table, returning a SymbolEncoding. The
* SymbolEncoding is not intended for long-term storage, but is used to help
* allocate a StatName with the correct amount of storage.
*
* When a name is encoded, it bumps reference counts held in the table for
* each symbol. The caller is responsible for creating a StatName using this
* SymbolEncoding and ultimately disposing of it by calling
* SymbolTable::free(). Users are protected from leaking symbols into the pool
* by ASSERTions in the SymbolTable destructor.
*
* @param name The name to encode.
* @return SymbolEncoding the encoded symbols.
*/
virtual SymbolEncoding encode(absl::string_view name) PURE;

/**
* @return uint64_t the number of symbols in the symbol table.
*/
virtual uint64_t numSymbols() const PURE;

/**
* Decodes a vector of symbols back into its period-delimited stat name. If
* decoding fails on any part of the symbol_vec, we release_assert and crash,
* since this should never happen, and we don't want to continue running
* with a corrupt stats set.
*
* @param stat_name the stat name.
* @return std::string stringifiied stat_name.
*/
virtual std::string toString(const StatName& stat_name) const PURE;

/**
* Deterines whether one StatName lexically precedes another. Note that
* the lexical order may not exactly match the lexical order of the
* elaborated strings. For example, stat-name of "-.-" would lexically
* sort after "---" but when encoded as a StatName would come lexically
* earlier. In practice this is unlikely to matter as those are not
* reasonable names for Envoy stats.
*
* Note that this operation has to be performed with the context of the
* SymbolTable so that the individual Symbol objects can be converted
* into strings for lexical comparison.
*
* @param a the first stat name
* @param b the second stat name
* @return bool true if a lexically precedes b.
*/
virtual bool lessThan(const StatName& a, const StatName& b) const PURE;

/**
* Joins two or more StatNames. For example if we have StatNames for {"a.b",
* "c.d", "e.f"} then the joined stat-name matches "a.b.c.d.e.f". The
* advantage of using this representation is that it avoids having to
* decode/encode into the elaborated form, and does not require locking the
* SymbolTable.
*
* The caveat is that this representation does not bump reference counts on
* the referenced Symbols in the SymbolTable, so it's only valid as long for
* the lifetime of the joined StatNames.
*
* This is intended for use doing cached name lookups of scoped stats, where
* the scope prefix and the names to combine it with are already in StatName
* form. Using this class, they can be combined without acessingm the
* SymbolTable or, in particular, taking its lock.
*
* @param stat_names the names to join.
* @return Storage allocated for the joined name.
*/
virtual StoragePtr join(const std::vector<StatName>& stat_names) const PURE;

#ifndef ENVOY_CONFIG_COVERAGE
virtual void debugPrint() const PURE;
#endif

private:
friend class StatNameStorage;
friend class StatNameList;

/**
* Since SymbolTable does manual reference counting, a client of SymbolTable
* must manually call free(symbol_vec) when it is freeing the backing store
* for a StatName. This way, the symbol table will grow and shrink
* dynamically, instead of being write-only.
*
* @param stat_name the stat name.
*/
virtual void free(const StatName& stat_name) PURE;

/**
* StatName backing-store can be managed by callers in a variety of ways
* to minimize overhead. But any persistent reference to a StatName needs
* to hold onto its own reference-counts for all symbols. This method
* helps callers ensure the symbol-storage is maintained for the lifetime
* of a reference.
*
* @param stat_name the stat name.
*/
virtual void incRefCount(const StatName& stat_name) PURE;
};

using SharedSymbolTable = std::shared_ptr<SymbolTable>;

} // namespace Stats
} // namespace Envoy
6 changes: 6 additions & 0 deletions source/common/stats/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,12 @@ envoy_cc_library(
],
)

envoy_cc_library(
name = "fake_symbol_table_lib",
hdrs = ["fake_symbol_table_impl.h"],
deps = [":symbol_table_lib"],
)

envoy_cc_library(
name = "stats_options_lib",
hdrs = ["stats_options_impl.h"],
Expand Down
98 changes: 98 additions & 0 deletions source/common/stats/fake_symbol_table_impl.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#pragma once

#include <algorithm>
#include <cstring>
#include <memory>
#include <stack>
#include <string>
#include <unordered_map>
#include <vector>

#include "envoy/common/exception.h"
#include "envoy/stats/symbol_table.h"

#include "common/common/assert.h"
#include "common/common/hash.h"
#include "common/common/lock_guard.h"
#include "common/common/non_copyable.h"
#include "common/common/thread.h"
#include "common/common/utility.h"
#include "common/stats/symbol_table_impl.h"

#include "absl/strings/str_join.h"
#include "absl/strings/str_split.h"

namespace Envoy {
namespace Stats {

/**
* Implements the SymbolTable interface without taking locks or saving memory.
* This implementation is intended as a transient state for the Envoy codebase
* to allow incremental conversion of Envoy stats call-sites to use the
* SymbolTable interface, pre-allocating symbols during construction time for
* all stats tokens.
*
* Once all stat tokens are symbolized at construction time, this
* FakeSymbolTable implementation can be deleted, and real-symbol tables can be
* used, thereby reducing memory and improving stat construction time.
*
* Note that it is not necessary to pre-allocate all elaborated stat names
* because multiple StatNames can be joined together without taking locks,
* even in SymbolTableImpl.
*
* This implementation simply stores the characters directly in the uint8_t[]
* that backs each StatName, so there is no sharing or memory savings, but also
* no state associated with the SymbolTable, and thus no locks needed.
*
* TODO(jmarantz): delete this class once SymbolTable is fully deployed in the
* Envoy codebase.
*/
class FakeSymbolTableImpl : public SymbolTable {
public:
SymbolEncoding encode(absl::string_view name) override { return encodeHelper(name); }

std::string toString(const StatName& stat_name) const override {
return std::string(toStringView(stat_name));
}
uint64_t numSymbols() const override { return 0; }
bool lessThan(const StatName& a, const StatName& b) const override {
return toStringView(a) < toStringView(b);
}
void free(const StatName&) override {}
void incRefCount(const StatName&) override {}
SymbolTable::StoragePtr join(const std::vector<StatName>& names) const override {
std::vector<absl::string_view> strings;
for (StatName name : names) {
absl::string_view str = toStringView(name);
if (!str.empty()) {
strings.push_back(str);
}
}
return stringToStorage(absl::StrJoin(strings, "."));
}

#ifndef ENVOY_CONFIG_COVERAGE
void debugPrint() const override {}
#endif

private:
SymbolEncoding encodeHelper(absl::string_view name) const {
SymbolEncoding encoding;
encoding.addStringForFakeSymbolTable(name);
return encoding;
}

absl::string_view toStringView(const StatName& stat_name) const {
return {reinterpret_cast<const char*>(stat_name.data()), stat_name.dataSize()};
}

SymbolTable::StoragePtr stringToStorage(absl::string_view name) const {
SymbolEncoding encoding = encodeHelper(name);
auto bytes = std::make_unique<uint8_t[]>(encoding.bytesRequired());
encoding.moveToStorage(bytes.get());
return bytes;
}
};

} // namespace Stats
} // namespace Envoy
Loading

0 comments on commit adb04ac

Please sign in to comment.