-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SsdCache to CachedBufferedInput #911
Conversation
@oerling There are test failures in linux-parquet-s3-build and build failures in macos-build.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
8093aee
to
a7b3ecc
Compare
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Masha, this should now build on Mac and fbcode with no warnings.
|
Thank you, Orri. Will take a look later today or tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oerling I find these changes hard to read. Some small comments below.
velox/common/caching/ScanTracker.h
Outdated
folly::F14FastMap<TrackingId, TrackingData> data_; | ||
TrackingData sum_; | ||
// Maximum size of a read. A to 10MB would count as two references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: "A to 10MB"
velox/common/caching/ScanTracker.h
Outdated
folly::F14FastMap<TrackingId, TrackingData> data_; | ||
TrackingData sum_; | ||
// Maximum size of a read. A to 10MB would count as two references | ||
// if the quantim were 8MB. At the same time this would count as a | ||
// single 10MB reference for 'fileGroupTracker_'. 0 means the read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: fileGroupTracker_ -> fileGroupStats_ ?
return std::make_unique<BufferedInput>(input, pool, dataCacheConfig); | ||
} | ||
|
||
static BufferedInputFactory* baseFactory(); | ||
virtual folly::Executor* FOLLY_NULLABLE executor() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you document executor(), copy() and baseFactory() methods?
if (!ssdPin.empty() && ssdPin.run().size() < entry->size()) { | ||
LOG(INFO) << fmt::format( | ||
"IOERR: Ssd entry for {}:{} {} b shorter than requested {}b", | ||
entry->key().fileNum.id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to add entry->toString() method?
file.find(cache::RawFileCacheKey{fileNum_, region.offset}); | ||
if (!ssdPin.empty() && ssdPin.run().size() < entry->size()) { | ||
LOG(INFO) << fmt::format( | ||
"IOERR: Ssd entry for {}:{} {} b shorter than requested {}b", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message seems off. It says "entry N b shorter than requested Mb". I assume "b" is for bytes. It reads as entry is N bytes shorter than M, but N is not a difference in bytes, rather is it the size of the entry. It would be nice to reword this message for clarity.
Also, why is this an OK condition? Why is logging enough?
entry->size()); | ||
ssdPin.clear(); | ||
} | ||
if (!ssdPin.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is very long. It would be nice to refactor the logic inside this "if" statement into a helper method.
region.offset - region_.offset, | ||
fileNum_, | ||
fileIds().string(fileNum_)); | ||
// Remove the non-loadable entry so that next access goes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it Ok to ignore this error? It would be nice to describe the error handing strategy in the PR description.
@@ -23,13 +23,19 @@ DEFINE_int32( | |||
80, | |||
"Minimum percentage of actual uses over references to a column for prefetching. No prefetch if > 100"); | |||
|
|||
DEFINE_int32( | |||
ssd_max_coalesce_distance, | |||
(50 << 10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 50KB? perhaps, add a comment for clarity.
for (auto part : parts) { | ||
if (cache_->exists(part->key)) { | ||
continue; | ||
for (auto readPct : std::vector<int32_t>{80, 50, 20, 0}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to document the constants used here.
Thank you, Masha. I made the requested edits.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oerling Orri, thank you for update the PR description to provide more details. These are very helpful and will be useful in the future as well when folks look up file history in Git. I edited the description to fix a few typos.
The PR is good to go except there is an accidental change to build/deps/github_hashes/facebook/folly-rev.txt file. Please, revert.
Adds wiring to CacheInputStream and CachedBufferedInput for reading from SsdCache where possible. Adds accounting of file and file group level access to ScanTracker/FileGroupStats. FileGroupStats is is a placeholder for logic for deciding which data to stage on SSD. CachedBufferedInput divides enqueued regions between RAM, SSD and storage. Such a division is made for different access frequency buckets. Columns that are accessed at the same frequency are assumed to be correlated. This is a reasonable assumption in uses that do a conjunction of column filters. Loads are coalesced within these buckets. Loads of the most densely accessed bucket are scheduled on an executor for prefetching if the executor is supplied. Other loads are made on first use, combining all nearby loads in one IO. When accessing a single addressable unit, e.g. values or nulls of a column, we look up the entry n the RAM cache, then in the SSD cache and finally load from storage. An entry can be cached with a different length than requested if we have loadss using a different load quantum. If the entry that is found in a cache is shorter than the requested size, the entry is ignored and will be replaced by a fresh longer entry. If loading an entry rom SSD fails, we log the error and proceed to load the data from storage, We remove said entry from SSD so as not to repeatedly hit the same error.
Thank you, Masha. I reexported it based on the new head. Git show does not show the mentioned accidental update, so this should be OK far as I can tell.
,
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: X-link: rsocket/rsocket-cpp#911 X-link: facebookexperimental/rust-shed#32 X-link: facebookincubator/reindeer#3 X-link: https://github.com/fairinternal/AIRStore/pull/36 Pull Request resolved: #1197 X-link: facebook/mvfst#242 X-link: facebook/sapling#116 X-link: facebookincubator/fizz#75 X-link: facebookincubator/katran#157 X-link: facebook/watchman#1011 X-link: facebook/wangle#205 X-link: facebook/proxygen#401 X-link: facebook/openr#129 X-link: facebookarchive/fbzmq#35 X-link: facebook/fb303#26 X-link: facebookarchive/bistro#59 X-link: facebook/folly#1734 X-link: facebook/fboss#113 Adds an environment variable to getdeps to provide `hg` info to avoid calling `hg` directly. When using `getdeps` inside a containerized environment (which we need to build Research Super Cluster tooling with the correct linker attributes), `getdeps` fails because of unregistered mercurial extensions in the `hgrc`. This allows `getdeps` to be useable in an environment where the mercurial extensions used in a project aren't installed/available. Reviewed By: vivekspai Differential Revision: D34732506 fbshipit-source-id: 6475088fbf3323347095ca5af8a902382cf8f9d0
Summary: X-link: rsocket/rsocket-cpp#911 X-link: facebookexperimental/rust-shed#32 X-link: facebookincubator/reindeer#3 X-link: https://github.com/fairinternal/AIRStore/pull/36 Pull Request resolved: facebookincubator#1197 X-link: facebook/mvfst#242 X-link: facebook/sapling#116 X-link: facebookincubator/fizz#75 X-link: facebookincubator/katran#157 X-link: facebook/watchman#1011 X-link: facebook/wangle#205 X-link: facebook/proxygen#401 X-link: facebook/openr#129 X-link: facebookarchive/fbzmq#35 X-link: facebook/fb303#26 X-link: facebookarchive/bistro#59 X-link: facebook/folly#1734 X-link: facebook/fboss#113 Adds an environment variable to getdeps to provide `hg` info to avoid calling `hg` directly. When using `getdeps` inside a containerized environment (which we need to build Research Super Cluster tooling with the correct linker attributes), `getdeps` fails because of unregistered mercurial extensions in the `hgrc`. This allows `getdeps` to be useable in an environment where the mercurial extensions used in a project aren't installed/available. Reviewed By: vivekspai Differential Revision: D34732506 fbshipit-source-id: 6475088fbf3323347095ca5af8a902382cf8f9d0
Adds wiring to CacheInputStream and CachedBufferedInput for reading
from SsdCache where possible. Adds accounting of file and file group
level access to ScanTracker/FileGroupStats. FileGroupStats is a
placeholder for logic for deciding which data to stage on SSD.
CachedBufferedInput divides enqueued regions between RAM, SSD and
storage. Such a division is made for different access frequency
buckets. Columns that are accessed at the same frequency are assumed
to be correlated. This is a reasonable assumption in uses that do a
conjunction of column filters. Loads are coalesced within these
buckets. Loads of the most densely accessed bucket are scheduled on an
executor for prefetching if the executor is supplied. Other loads are
made on first use, combining all nearby loads in one IO.
When accessing a single addressable unit, e.g. values or nulls of a
column, we look up the entry in the RAM cache, then in the SSD cache
and finally load from storage. An entry can be cached with a different
length than requested if we have loads using a different load
quantum. If the entry that is found in a cache is shorter than the
requested size, the entry is ignored and will be replaced by a fresh
longer entry. If loading an entry rom SSD fails, we log the error and
proceed to load the data from storage, We remove said entry from SSD
so as not to repeatedly hit the same error.