Skip to content

Commit

Permalink
sample FS trace events for logging
Browse files Browse the repository at this point in the history
Summary:
The current fb303 counters only report aggregated latency while we want to track Eden performance under different version, os, channel, and configs. So I am setting up a new logging mechanism for this purpose.

This diff introduces the class `FsEventLogger` for sampling and logging. There are 3 configs introduced by this diff. The configs are reloaded every 30 minutes.
1. `telemetry:request-sampling-config-allowlist`
A list of config keys that we want to attach to scuba events.

2. `telemetry:request-samples-per-minute`
Max number of events logged to scuba per minute per mount.

3. `telemetry:request-sampling-group-denominators`
* Each type of operation has a "sampling group" (defaulted to 0, which is dropping all).
* We use this sampling group as index to look up its denominator in this config.
* The denominator is then used for sampling. e.g. `1/x` of the events are send to scuba, if we haven't reached the cap specified by #2.

Example workflow:
1. receive tracing event
2. look up denominator of the sampling group of the operation type
3. sample based on the denominator
4. check that we have not exceeded the logging cap per min
5. create sample and send to scribe

Reviewed By: xavierd

Differential Revision: D30288054

fbshipit-source-id: 8f2b95c11c718550a8162f4d1259a25628f499ff
  • Loading branch information
Zhengchao Liu authored and facebook-github-bot committed Aug 19, 2021
1 parent 025929c commit 432d674
Show file tree
Hide file tree
Showing 10 changed files with 281 additions and 17 deletions.
54 changes: 54 additions & 0 deletions eden/fs/config/EdenConfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,60 @@ class EdenConfig : private ConfigSettingManager {
std::nullopt,
this};

/**
* Controls sample denominator for each request sampling group.
* We assign request types into sampling groups based on their usage and
* set a sample denominator for each sampling group so that we have the
* flexibility of up/down-sampling different requests but also avoid having to
* set a sampling rate for each of the dozens of request types. For example,
* `mkdir` and `rmdir` can be assigned to a sampling group that have a high
* sampling rate while `getattr` and `getxattr` to another sampling group with
* low sampling rate as they happen very frequently.
*
* Sampling rates are calculated from sampling denominators. A denominator of
* 0 indicates dropping all requests in the group. Group 0's value is ignored
* as it's always considered as having denominator of 0. A positive
* denominator means that the requests in the group are sampled at 1/x (so
* denominator of 1 drops no events).
*
* We use sampling group as indexes into this vector to look
* up their denominators. Thus, the size of this vector should match the
* number of sampling groups defined by the enum `SamplingGroup`. If the
* vector has fewer elements than the number of sampling groups, look-ups will
* fail for the higher sampling groups and we will consider them having
* denominator of 0. For example, if the vector has size of 3, all requests of
* sampling group 4 will be dropped.
* Keeping this vector in ascending order is recommended but not required.
* e.g. {0, 10, 100, 1000, 10000}
*/
ConfigSetting<std::vector<uint32_t>> requestSamplingGroupDenominators{
"telemetry:request-sampling-group-denominators",
std::vector<uint32_t>{0, 0, 0, 0, 0},
this};

/**
* Controls the max number of requests per minute per mount that can be sent
* for logging.
* A request is first sampled based on its sampling group denominators. Then
* if we have not reached this cap, the request is sent for logging.
*/
ConfigSetting<uint32_t> requestSamplesPerMinute{
"telemetry:request-samples-per-minute",
0,
this};

/**
* Controls which configs we want to send with the request logging.
* The elements are full config keys, e.g. "hg:import-batch-size".
* Elements not valid or not present in the config map are silently ignored.
* This is only intended for facilitating A/B testing and should be empty if
* there is no active experiment.
*/
ConfigSetting<std::vector<std::string>> requestSamplingConfigAllowlist{
"telemetry:request-sampling-config-allowlist",
std::vector<std::string>{},
this};

// [experimental]

/**
Expand Down
61 changes: 51 additions & 10 deletions eden/fs/fuse/FuseChannel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
#include <folly/logging/xlog.h>
#include <folly/system/ThreadName.h>
#include <signal.h>
#include <chrono>
#include <type_traits>
#include "eden/fs/fuse/DirList.h"
#include "eden/fs/fuse/FuseDispatcher.h"
#include "eden/fs/fuse/FuseRequestContext.h"
#include "eden/fs/telemetry/FsEventLogger.h"
#include "eden/fs/utils/Bug.h"
#include "eden/fs/utils/IDGen.h"
#include "eden/fs/utils/Synchronized.h"
Expand Down Expand Up @@ -234,8 +236,14 @@ struct HandlerEntry {
Handler h,
FuseArgRenderer r,
ChannelThreadStats::StatPtr s,
AccessType at = AccessType::FsChannelOther)
: name{n}, handler{h}, argRenderer{r}, stat{s}, accessType{at} {}
AccessType at = AccessType::FsChannelOther,
SamplingGroup samplingGroup = SamplingGroup::DropAll)
: name{n},
handler{h},
argRenderer{r},
stat{s},
samplingGroup{samplingGroup},
accessType{at} {}

std::string getShortName() const {
if (name.startsWith("FUSE_")) {
Expand All @@ -260,6 +268,7 @@ struct HandlerEntry {
Handler handler = nullptr;
FuseArgRenderer argRenderer = nullptr;
ChannelThreadStats::StatPtr stat = nullptr;
SamplingGroup samplingGroup = SamplingGroup::DropAll;
AccessType accessType = AccessType::FsChannelOther;
};

Expand Down Expand Up @@ -351,13 +360,15 @@ constexpr auto kFuseHandlers = [] {
&FuseChannel::fuseRead,
&argrender::read,
&ChannelThreadStats::read,
Read};
Read,
SamplingGroup::Three};
handlers[FUSE_WRITE] = {
"FUSE_WRITE",
&FuseChannel::fuseWrite,
&argrender::write,
&ChannelThreadStats::write,
Write};
Write,
SamplingGroup::Four};
handlers[FUSE_STATFS] = {
"FUSE_STATFS",
&FuseChannel::fuseStatFs,
Expand Down Expand Up @@ -415,7 +426,8 @@ constexpr auto kFuseHandlers = [] {
&FuseChannel::fuseReadDir,
&argrender::readdir,
&ChannelThreadStats::readdir,
Read};
Read,
SamplingGroup::Two};
handlers[FUSE_RELEASEDIR] = {
"FUSE_RELEASEDIR",
&FuseChannel::fuseReleaseDir,
Expand Down Expand Up @@ -590,6 +602,11 @@ iovec make_iovec(const T& t) {
return iov;
}

SamplingGroup fuseOpcodeSamplingGroup(uint32_t opcode) {
auto* entry = lookupFuseHandlerEntry(opcode);
return entry ? entry->samplingGroup : SamplingGroup::DropAll;
}

} // namespace

StringPiece fuseOpcodeName(uint32_t opcode) {
Expand Down Expand Up @@ -779,6 +796,7 @@ FuseChannel::FuseChannel(
std::unique_ptr<FuseDispatcher> dispatcher,
const folly::Logger* straceLogger,
std::shared_ptr<ProcessNameCache> processNameCache,
std::shared_ptr<FsEventLogger> fsEventLogger,
folly::Duration requestTimeout,
Notifications* notifications,
CaseSensitivity caseSensitive,
Expand All @@ -804,20 +822,43 @@ FuseChannel::FuseChannel(
installSignalHandler();

traceSubscriptionHandles_.push_back(traceBus_->subscribeFunction(
"FuseChannel request tracking", [this](const FuseTraceEvent& event) {
"FuseChannel request tracking",
[this,
fsEventLogger = std::move(fsEventLogger)](const FuseTraceEvent& event) {
switch (event.getType()) {
case FuseTraceEvent::START: {
auto state = telemetryState_.wlock();
auto [iter, inserted] = state->requests.emplace(
event.getUnique(),
OutstandingRequest{event.getUnique(), event.getRequest()});
OutstandingRequest{
event.getUnique(),
event.getRequest(),
event.monotonicTime});
XCHECK(inserted) << "duplicate fuse start event";
break;
}
case FuseTraceEvent::FINISH: {
auto state = telemetryState_.wlock();
auto erased = state->requests.erase(event.getUnique());
XCHECK(erased) << "duplicate fuse finish event";
uint64_t durationNs = 0;
{
auto state = telemetryState_.wlock();
auto it = state->requests.find(event.getUnique());
XCHECK(it != state->requests.end())
<< "duplicate fuse finish event";
durationNs =
std::chrono::duration_cast<std::chrono::nanoseconds>(
event.monotonicTime - it->second.requestStartTime)
.count();
state->requests.erase(it);
}

if (fsEventLogger) {
auto opcode = event.getRequest().opcode;
fsEventLogger->log({
durationNs,
fuseOpcodeSamplingGroup(opcode),
fuseOpcodeName(opcode),
});
}
break;
}
}
Expand Down
3 changes: 3 additions & 0 deletions eden/fs/fuse/FuseChannel.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ struct Unit;
namespace facebook::eden {

class Notifications;
class FsEventLogger;
class FuseRequestContext;

using TraceDetailedArgumentsHandle = std::shared_ptr<void>;
Expand Down Expand Up @@ -205,6 +206,7 @@ class FuseChannel {
struct OutstandingRequest {
uint64_t unique;
FuseTraceEvent::RequestHeader request;
std::chrono::steady_clock::time_point requestStartTime;
};

/**
Expand All @@ -225,6 +227,7 @@ class FuseChannel {
std::unique_ptr<FuseDispatcher> dispatcher,
const folly::Logger* straceLogger,
std::shared_ptr<ProcessNameCache> processNameCache,
std::shared_ptr<FsEventLogger> fsEventLogger,
folly::Duration requestTimeout,
Notifications* FOLLY_NULLABLE notifications,
CaseSensitivity caseSensitive,
Expand Down
7 changes: 4 additions & 3 deletions eden/fs/fuse/fuse_tester/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -133,11 +133,12 @@ int main(int argc, char** argv) {
std::move(dispatcher),
&straceLogger,
std::make_shared<ProcessNameCache>(),
/*fsEventLogger=*/nullptr,
std::chrono::seconds(60),
nullptr,
/*notifications=*/nullptr,
CaseSensitivity::Sensitive,
true,
12 /* the default on Linux */));
/*requireUtf8Path=*/true,
/*maximumBackgroundRequests=*/12 /* the default on Linux */));

XLOG(INFO) << "Starting FUSE...";
auto completionFuture = channel->initialize().get();
Expand Down
7 changes: 4 additions & 3 deletions eden/fs/fuse/test/FuseChannelTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,12 @@ class FuseChannelTest : public ::testing::Test {
std::move(testDispatcher),
&straceLogger,
std::make_shared<ProcessNameCache>(),
/*fsEventLogger=*/nullptr,
std::chrono::seconds(60),
nullptr,
/*notifications=*/nullptr,
CaseSensitivity::Sensitive,
true,
12));
/*requireUtf8Path=*/true,
/*maximumBackgroundRequests=*/12));
}

FuseChannel::StopFuture performInit(
Expand Down
1 change: 1 addition & 0 deletions eden/fs/inodes/EdenMount.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1381,6 +1381,7 @@ std::unique_ptr<FuseChannel, FuseChannelDeleter> makeFuseChannel(
EdenDispatcherFactory::makeFuseDispatcher(mount),
&mount->getStraceLogger(),
mount->getServerState()->getProcessNameCache(),
mount->getServerState()->getFsEventLogger(),
std::chrono::duration_cast<folly::Duration>(
edenConfig->fuseRequestTimeout.getValue()),
mount->getServerState()->getNotifications(),
Expand Down
15 changes: 14 additions & 1 deletion eden/fs/inodes/ServerState.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include "eden/fs/config/EdenConfig.h"
#include "eden/fs/model/git/TopLevelIgnores.h"
#include "eden/fs/telemetry/FsEventLogger.h"
#include "eden/fs/utils/Clock.h"
#include "eden/fs/utils/FaultInjector.h"
#include "eden/fs/utils/UnboundedQueueExecutor.h"
Expand All @@ -22,6 +23,14 @@ DEFINE_bool(
"Block mount attempts via the fault injection framework. "
"Requires --enable_fault_injection.");

namespace {
#if defined(EDEN_HAVE_HIVE_LOGGER)
constexpr auto kHasHiveLogger = true;
#else
constexpr auto kHasHiveLogger = false;
#endif
} // namespace

namespace facebook {
namespace eden {

Expand Down Expand Up @@ -58,7 +67,11 @@ ServerState::ServerState(
systemIgnoreFileMonitor_{CachedParsedFileMonitor<GitIgnoreFileParser>{
edenConfig->systemIgnoreFile.getValue(),
kSystemIgnoreMinPollSeconds}},
notifications_(config_) {
notifications_(config_),
fsEventLogger_{
(kHasHiveLogger && edenConfig->requestSamplesPerMinute.getValue())
? std::make_shared<FsEventLogger>(config_, hiveLogger_)
: nullptr} {
// It would be nice if we eventually built a more generic mechanism for
// defining faults to be configured on start up. (e.g., loading this from the
// EdenConfig).
Expand Down
11 changes: 11 additions & 0 deletions eden/fs/inodes/ServerState.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ class Clock;
class EdenConfig;
class FaultInjector;
class IHiveLogger;
class FsEventLogger;
class ProcessNameCache;
class StructuredLogger;
class TopLevelIgnores;
Expand Down Expand Up @@ -155,6 +156,15 @@ class ServerState {
return hiveLogger_.get();
}

/**
* Returns a pointer to the FsEventLogger for logging FS event samples, if the
* platform supports it. Otherwise, returns nullptr. The caller is responsible
* for null checking.
*/
std::shared_ptr<FsEventLogger> getFsEventLogger() const {
return fsEventLogger_;
}

FaultInjector& getFaultInjector() {
return *faultInjector_;
}
Expand Down Expand Up @@ -182,6 +192,7 @@ class ServerState {
folly::Synchronized<CachedParsedFileMonitor<GitIgnoreFileParser>>
systemIgnoreFileMonitor_;
Notifications notifications_;
std::shared_ptr<FsEventLogger> fsEventLogger_;
};
} // namespace eden
} // namespace facebook
Loading

0 comments on commit 432d674

Please sign in to comment.