Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SsdCache to CachedBufferedInput #911

Closed
wants to merge 1 commit into from

Conversation

oerling
Copy link
Contributor

@oerling oerling commented Jan 21, 2022

Adds wiring to CacheInputStream and CachedBufferedInput for reading
from SsdCache where possible. Adds accounting of file and file group
level access to ScanTracker/FileGroupStats. FileGroupStats is a
placeholder for logic for deciding which data to stage on SSD.

CachedBufferedInput divides enqueued regions between RAM, SSD and
storage. Such a division is made for different access frequency
buckets. Columns that are accessed at the same frequency are assumed
to be correlated. This is a reasonable assumption in uses that do a
conjunction of column filters. Loads are coalesced within these
buckets. Loads of the most densely accessed bucket are scheduled on an
executor for prefetching if the executor is supplied. Other loads are
made on first use, combining all nearby loads in one IO.

When accessing a single addressable unit, e.g. values or nulls of a
column, we look up the entry in the RAM cache, then in the SSD cache
and finally load from storage. An entry can be cached with a different
length than requested if we have loads using a different load
quantum. If the entry that is found in a cache is shorter than the
requested size, the entry is ignored and will be replaced by a fresh
longer entry. If loading an entry rom SSD fails, we log the error and
proceed to load the data from storage, We remove said entry from SSD
so as not to repeatedly hit the same error.

@oerling oerling requested a review from mbasmanova January 21, 2022 05:46
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2022
@mbasmanova
Copy link
Contributor

@oerling There are test failures in linux-parquet-s3-build and build failures in macos-build.

99% tests passed, 1 tests failed out of 121

Total Test time (real) = 1500.11 sec

The following tests FAILED:
	  3 - velox_cache_test (Timeout)

FAILED: velox/common/caching/CMakeFiles/velox_caching.dir/AsyncDataCache.cpp.o 
ccache /Applications/Xcode-11.7.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBOOST_ALL_NO_LIB -DBOOST_CONTEXT_DYN_LINK -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_REGEX_DYN_LINK -DBOOST_SYSTEM_DYN_LINK -DBOOST_THREAD_DYN_LINK -DCODEGEN_ENABLED=1 -DFMT_LOCALE -DGFLAGS_IS_A_DLL=0 -I/usr/local/opt/icu4c/include -I/Users/distiller/project/. -isystem /Users/distiller/deps/include -isystem /Users/distiller/project/velox -isystem /Users/distiller/project/velox/external -isystem /Users/distiller/project/velox/external/duckdb -isystem /Users/distiller/deps/opt/openssl@1.1/include -mavx2 -mfma -mavx -mf16c -mbmi2 -march=native -D USE_VELOX_COMMON_BASE -D HAS_UNCAUGHT_EXCEPTIONS -Werror -Wall -Wextra -Wno-unused        -Wno-unused-parameter        -Wno-sign-compare        -Wno-ignored-qualifiers        -Wno-range-loop-analysis -Wno-mismatched-tags -g -isysroot /Applications/Xcode-11.7.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -std=gnu++17 -MD -MT velox/common/caching/CMakeFiles/velox_caching.dir/AsyncDataCache.cpp.o -MF velox/common/caching/CMakeFiles/velox_caching.dir/AsyncDataCache.cpp.o.d -o velox/common/caching/CMakeFiles/velox_caching.dir/AsyncDataCache.cpp.o -c /Users/distiller/project/velox/common/caching/AsyncDataCache.cpp
In file included from /Users/distiller/project/velox/common/caching/AsyncDataCache.cpp:17:
In file included from /Users/distiller/project/./velox/common/caching/AsyncDataCache.h:26:
In file included from /Users/distiller/project/./velox/common/caching/FileGroupStats.h:19:
/Users/distiller/project/./velox/common/caching/ScanTracker.h:128:37: error: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Werror,-Wnullability-completeness]
      std::function<void(ScanTracker*)> unregisterer,
                                    ^
/Users/distiller/project/./velox/common/caching/ScanTracker.h:128:37: note: insert '_Nullable' if the pointer may be null
      std::function<void(ScanTracker*)> unregisterer,
                                    ^
                                     _Nullable
/

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@oerling oerling force-pushed the ssd-input-pr branch 7 times, most recently from 8093aee to a7b3ecc Compare January 24, 2022 17:20
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@oerling
Copy link
Contributor Author

oerling commented Jan 24, 2022 via email

@mbasmanova
Copy link
Contributor

Masha, this should now build on Mac and fbcode with no warnings.

Thank you, Orri. Will take a look later today or tomorrow.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oerling I find these changes hard to read. Some small comments below.

folly::F14FastMap<TrackingId, TrackingData> data_;
TrackingData sum_;
// Maximum size of a read. A to 10MB would count as two references
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: "A to 10MB"

folly::F14FastMap<TrackingId, TrackingData> data_;
TrackingData sum_;
// Maximum size of a read. A to 10MB would count as two references
// if the quantim were 8MB. At the same time this would count as a
// single 10MB reference for 'fileGroupTracker_'. 0 means the read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: fileGroupTracker_ -> fileGroupStats_ ?

return std::make_unique<BufferedInput>(input, pool, dataCacheConfig);
}

static BufferedInputFactory* baseFactory();
virtual folly::Executor* FOLLY_NULLABLE executor() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you document executor(), copy() and baseFactory() methods?

if (!ssdPin.empty() && ssdPin.run().size() < entry->size()) {
LOG(INFO) << fmt::format(
"IOERR: Ssd entry for {}:{} {} b shorter than requested {}b",
entry->key().fileNum.id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to add entry->toString() method?

file.find(cache::RawFileCacheKey{fileNum_, region.offset});
if (!ssdPin.empty() && ssdPin.run().size() < entry->size()) {
LOG(INFO) << fmt::format(
"IOERR: Ssd entry for {}:{} {} b shorter than requested {}b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message seems off. It says "entry N b shorter than requested Mb". I assume "b" is for bytes. It reads as entry is N bytes shorter than M, but N is not a difference in bytes, rather is it the size of the entry. It would be nice to reword this message for clarity.

Also, why is this an OK condition? Why is logging enough?

entry->size());
ssdPin.clear();
}
if (!ssdPin.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very long. It would be nice to refactor the logic inside this "if" statement into a helper method.

region.offset - region_.offset,
fileNum_,
fileIds().string(fileNum_));
// Remove the non-loadable entry so that next access goes to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it Ok to ignore this error? It would be nice to describe the error handing strategy in the PR description.

@@ -23,13 +23,19 @@ DEFINE_int32(
80,
"Minimum percentage of actual uses over references to a column for prefetching. No prefetch if > 100");

DEFINE_int32(
ssd_max_coalesce_distance,
(50 << 10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 50KB? perhaps, add a comment for clarity.

for (auto part : parts) {
if (cache_->exists(part->key)) {
continue;
for (auto readPct : std::vector<int32_t>{80, 50, 20, 0}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to document the constants used here.

@oerling
Copy link
Contributor Author

oerling commented Jan 25, 2022 via email

@mbasmanova mbasmanova requested a review from spershin January 25, 2022 15:09
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oerling Orri, thank you for update the PR description to provide more details. These are very helpful and will be useful in the future as well when folks look up file history in Git. I edited the description to fix a few typos.

The PR is good to go except there is an accidental change to build/deps/github_hashes/facebook/folly-rev.txt file. Please, revert.

Adds wiring to CacheInputStream and CachedBufferedInput for reading
from SsdCache where possible.  Adds accounting of file and file group
level access to ScanTracker/FileGroupStats. FileGroupStats is is a
placeholder for logic for deciding which data to stage on SSD.

CachedBufferedInput divides enqueued regions between RAM, SSD and
storage. Such a division is made for different access frequency
buckets. Columns that are accessed at the same frequency are assumed
to be correlated. This is a reasonable assumption in uses that do a
conjunction of column filters. Loads are coalesced within these
buckets. Loads of the most densely accessed bucket are scheduled on an
executor for prefetching if the executor is supplied. Other loads are
made on first use, combining all nearby loads in one IO.

When accessing a single addressable unit, e.g. values or nulls of a
column, we look up the entry n the RAM cache, then in the SSD cache
and finally load from storage. An entry can be cached with a different
length than requested if we have loadss using a different load
quantum. If the entry that is found in a cache is shorter than the
requested size, the entry is ignored and will be replaced by a fresh
longer entry. If loading an entry rom SSD fails, we log the error and
proceed to load the data from storage, We remove said entry from SSD
so as not to repeatedly hit the same error.
@oerling
Copy link
Contributor Author

oerling commented Jan 25, 2022 via email

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Mar 12, 2022
Summary:
X-link: rsocket/rsocket-cpp#911

X-link: facebookexperimental/rust-shed#32

X-link: facebookincubator/reindeer#3

X-link: https://github.com/fairinternal/AIRStore/pull/36

Pull Request resolved: #1197

X-link: facebook/mvfst#242

X-link: facebook/sapling#116

X-link: facebookincubator/fizz#75

X-link: facebookincubator/katran#157

X-link: facebook/watchman#1011

X-link: facebook/wangle#205

X-link: facebook/proxygen#401

X-link: facebook/openr#129

X-link: facebookarchive/fbzmq#35

X-link: facebook/fb303#26

X-link: facebookarchive/bistro#59

X-link: facebook/folly#1734

X-link: facebook/fboss#113

Adds an environment variable to getdeps to provide `hg` info to avoid calling `hg` directly.

When using `getdeps` inside a containerized environment (which we need to build Research Super Cluster tooling with the correct linker attributes), `getdeps` fails because of unregistered mercurial extensions in the `hgrc`.

This allows `getdeps` to be useable in an environment where the mercurial extensions used in a project aren't installed/available.

Reviewed By: vivekspai

Differential Revision: D34732506

fbshipit-source-id: 6475088fbf3323347095ca5af8a902382cf8f9d0
shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this pull request Aug 18, 2022
Summary:
X-link: rsocket/rsocket-cpp#911

X-link: facebookexperimental/rust-shed#32

X-link: facebookincubator/reindeer#3

X-link: https://github.com/fairinternal/AIRStore/pull/36

Pull Request resolved: facebookincubator#1197

X-link: facebook/mvfst#242

X-link: facebook/sapling#116

X-link: facebookincubator/fizz#75

X-link: facebookincubator/katran#157

X-link: facebook/watchman#1011

X-link: facebook/wangle#205

X-link: facebook/proxygen#401

X-link: facebook/openr#129

X-link: facebookarchive/fbzmq#35

X-link: facebook/fb303#26

X-link: facebookarchive/bistro#59

X-link: facebook/folly#1734

X-link: facebook/fboss#113

Adds an environment variable to getdeps to provide `hg` info to avoid calling `hg` directly.

When using `getdeps` inside a containerized environment (which we need to build Research Super Cluster tooling with the correct linker attributes), `getdeps` fails because of unregistered mercurial extensions in the `hgrc`.

This allows `getdeps` to be useable in an environment where the mercurial extensions used in a project aren't installed/available.

Reviewed By: vivekspai

Differential Revision: D34732506

fbshipit-source-id: 6475088fbf3323347095ca5af8a902382cf8f9d0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants