Skip to content

Commit 06a191b

Browse files
m3bm3bcmumford
authored andcommitted
fix problems in LevelDB's caching code
Background: LevelDB uses a cache (util/cache.h, util/cache.cc) of (key,value) pairs for two purposes: - a cache of (table, file handle) pairs - a cache of blocks The cache places the (key,value) pairs in a reference-counted wrapper. When it returns a value, it returns a reference to this wrapper. When the client has finished using the reference and its enclosed (key,value), it calls Release() to decrement the reference count. Each (key,value) pair has an associated resource usage. The cache maintains the sum of the usages of the elements it holds, and removes values as needed to keep the sum below a capacity threshold. It maintains an LRU list so that it will remove the least-recently used elements first. The max_open_files option to LevelDB sets the size of the cache of (table, file handle) pairs. The option is not used in any other way. The observed behaviour: If LevelDB at any time used more file handles concurrently than the cache size set via max_open_files, it attempted to reduce the number by evicting entries from the table cache. This could happen most easily during compaction, and if max_open_files was low. Because the handles were in use, their reference count did not drop to zero, and so the usage sum in the cache was not modified by the evictions. Subsequent Insert() calls returned valid handles, but their entries were immediately evicted from the cache, which though empty still acted as though full. As a result, there was effectively no caching, and the number of open file handles rose []ly until it hit system-imposed limits and the process died. If one set max_open_files lower, the cache was more likely to exhibit this beahviour, and cause the process to run out of file descriptors. That is, max_open_files acted in almost exactly the opposite manner from what was intended. The problems: 1. The cache kept all elements on its LRU list eligible for capacity eviction---even those with outstanding references from clients. This was ineffective in reducing resource consumption because there was an outstanding reference, guaranteeing that the items remained. A secondary issue was that there is no guarantee that these in-use items will be the last things reached in the LRU chain, which actually recorded "least-recently requested" rather than "least-recently used". 2. The sum of usages was decremented not when a (key,value) was evicted from the cache, but when its reference count went to zero. Thus, when things were removed from the cache, either by garbage collection or via Erase(), the usage sum was not necessarily decreased. This allowed the cache to act as though full when it was in fact not, reducing caching effectiveness, and leading to more resources being consumed---the opposite of what the evictions were intended to achieve. 3. (minor) The cache's clients insert items into it by first looking up the key, and inserting only if no value is found. Although the cache has an internal lock, the clients use no locking to ensure atomicity of the Lookup/Insert pair. (see table/table.cc: block_cache->Insert() and db/table_cache.cc: cache_->Insert()). Thus, if two threads Insert() at about the same time, they can both Lookup(), find nothing, and both Insert(). The second Insert() would evict the first value, leaving each thread with a handle on its own version of the data, and with the second version in the cache. It would be better if both threads ended up with a handle on the same (key,value) pair, which implies it must be the first item inserted. This suggests that Insert() should not replace an existing value. This can be made safe with current usage inside LeveDB itself, but this is not easy to change first because Cache is a public interface, so to change the semantics of an existing call might break things, second because Cache is an abstract virtual class, so adding a new abstract virtual method may break other implementations, and third, the new method "insert without replacing" cannot be implemented in terms of the existing methods, so cannot be implemented with a non-abstract default. But fortunately, the effects of this issue are minor, so this issue is not fixed by this change. The changes: The assumption in the fixes is that it is always better to cache entries unless removal from the cache would lead to deallocation. Cache entries now have an "in_cache" boolean indicating whether the cache has a reference on the entry. The only ways that this can become false without the entry being passed to its "deleter" are via Erase(), via Insert() when an element with a duplicate key is inserted, or on destruction of the cache. The cache now keeps two linked lists instead of one. All items in the cache are in one list or the other, and never both. Items still referenced by clients but erased from the cache are in neither list. The lists are: - in-use: contains the items currently referenced by clients, in no particular order. (This list is used for invariant checking. If we removed the check, elements that would otherwise be on this list could be left as disconnected singleton lists.) - LRU: contains the items not currently referenced by clients, in LRU order A new internal Ref() method increments the reference count. If incrementing from 1 to 2 for an item in the cache, it is moved from the LRU list to the in-use list. The Unref() call now moves things from the in-use list to the LRU list if the reference count falls to 1, and the item is in the cache. It no longer adjusts the usage sum. The usage sum now reflects only what is in the cache, rather than including still-referenced items that have been evicted. The LRU_Append() now takes a "list" parameter so that it can be used to append either to the LRU list or the in-use list. Lookup() is modified to use the new Ref() call, rather than adjusting the reference count and LRU chain directly. Insert() eviction code is also modified to adjust the usage sum and the in_cache boolean of the evicted elements. Some LevelDB tests assume that there will be no caching whatsoever if the cache size is set to zero, so this is handled as a special case. A new private method FinishErase() is factored out with the common code from where items are removed from the cache. Erase() is modified to adjust the usage sum and the in_cache boolean of the erased elements, and to use FinishErase(). Prune() is modified to use FinishErase() also, and to make use of the fact that the lru_ list now contains only items with reference count 1. - EvictionPolicy is modified to test that an entry with an outstanding handle is not evicted. This test fails with the old cache.cc. - A new test case UseExceedsCacheSize verifies that even when the cache is overfull of entries with outstanding handles, none are evicted. This test fails with the old cache.cc, and is the key issue that causes file descriptors to run out when the cache size is set too small. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=123247237
1 parent a7bff69 commit 06a191b

File tree

2 files changed

+112
-35
lines changed

2 files changed

+112
-35
lines changed

util/cache.cc

+84-34
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,23 @@ Cache::~Cache() {
1919
namespace {
2020

2121
// LRU cache implementation
22+
//
23+
// Cache entries have an "in_cache" boolean indicating whether the cache has a
24+
// reference on the entry. The only ways that this can become false without the
25+
// entry being passed to its "deleter" are via Erase(), via Insert() when
26+
// an element with a duplicate key is inserted, or on destruction of the cache.
27+
//
28+
// The cache keeps two linked lists of items in the cache. All items in the
29+
// cache are in one list or the other, and never both. Items still referenced
30+
// by clients but erased from the cache are in neither list. The lists are:
31+
// - in-use: contains the items currently referenced by clients, in no
32+
// particular order. (This list is used for invariant checking. If we
33+
// removed the check, elements that would otherwise be on this list could be
34+
// left as disconnected singleton lists.)
35+
// - LRU: contains the items not currently referenced by clients, in LRU order
36+
// Elements are moved between these lists by the Ref() and Unref() methods,
37+
// when they detect an element in the cache acquiring or losing its only
38+
// external reference.
2239

2340
// An entry is a variable length heap-allocated structure. Entries
2441
// are kept in a circular doubly linked list ordered by access time.
@@ -30,7 +47,8 @@ struct LRUHandle {
3047
LRUHandle* prev;
3148
size_t charge; // TODO(opt): Only allow uint32_t?
3249
size_t key_length;
33-
uint32_t refs;
50+
bool in_cache; // Whether entry is in the cache.
51+
uint32_t refs; // References, including cache reference, if present.
3452
uint32_t hash; // Hash of key(); used for fast sharding and comparisons
3553
char key_data[1]; // Beginning of key
3654

@@ -155,8 +173,10 @@ class LRUCache {
155173

156174
private:
157175
void LRU_Remove(LRUHandle* e);
158-
void LRU_Append(LRUHandle* e);
176+
void LRU_Append(LRUHandle*list, LRUHandle* e);
177+
void Ref(LRUHandle* e);
159178
void Unref(LRUHandle* e);
179+
bool FinishErase(LRUHandle* e);
160180

161181
// Initialized before use.
162182
size_t capacity_;
@@ -167,34 +187,55 @@ class LRUCache {
167187

168188
// Dummy head of LRU list.
169189
// lru.prev is newest entry, lru.next is oldest entry.
190+
// Entries have refs==1 and in_cache==true.
170191
LRUHandle lru_;
171192

193+
// Dummy head of in-use list.
194+
// Entries are in use by clients, and have refs >= 2 and in_cache==true.
195+
LRUHandle in_use_;
196+
172197
HandleTable table_;
173198
};
174199

175200
LRUCache::LRUCache()
176201
: usage_(0) {
177-
// Make empty circular linked list
202+
// Make empty circular linked lists.
178203
lru_.next = &lru_;
179204
lru_.prev = &lru_;
205+
in_use_.next = &in_use_;
206+
in_use_.prev = &in_use_;
180207
}
181208

182209
LRUCache::~LRUCache() {
210+
assert(in_use_.next == &in_use_); // Error if caller has an unreleased handle
183211
for (LRUHandle* e = lru_.next; e != &lru_; ) {
184212
LRUHandle* next = e->next;
185-
assert(e->refs == 1); // Error if caller has an unreleased handle
213+
assert(e->in_cache);
214+
e->in_cache = false;
215+
assert(e->refs == 1); // Invariant of lru_ list.
186216
Unref(e);
187217
e = next;
188218
}
189219
}
190220

221+
void LRUCache::Ref(LRUHandle* e) {
222+
if (e->refs == 1 && e->in_cache) { // If on lru_ list, move to in_use_ list.
223+
LRU_Remove(e);
224+
LRU_Append(&in_use_, e);
225+
}
226+
e->refs++;
227+
}
228+
191229
void LRUCache::Unref(LRUHandle* e) {
192230
assert(e->refs > 0);
193231
e->refs--;
194-
if (e->refs <= 0) {
195-
usage_ -= e->charge;
232+
if (e->refs == 0) { // Deallocate.
233+
assert(!e->in_cache);
196234
(*e->deleter)(e->key(), e->value);
197235
free(e);
236+
} else if (e->in_cache && e->refs == 1) { // No longer in use; move to lru_ list.
237+
LRU_Remove(e);
238+
LRU_Append(&lru_, e);
198239
}
199240
}
200241

@@ -203,10 +244,10 @@ void LRUCache::LRU_Remove(LRUHandle* e) {
203244
e->prev->next = e->next;
204245
}
205246

206-
void LRUCache::LRU_Append(LRUHandle* e) {
207-
// Make "e" newest entry by inserting just before lru_
208-
e->next = &lru_;
209-
e->prev = lru_.prev;
247+
void LRUCache::LRU_Append(LRUHandle* list, LRUHandle* e) {
248+
// Make "e" newest entry by inserting just before *list
249+
e->next = list;
250+
e->prev = list->prev;
210251
e->prev->next = e;
211252
e->next->prev = e;
212253
}
@@ -215,9 +256,7 @@ Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
215256
MutexLock l(&mutex_);
216257
LRUHandle* e = table_.Lookup(key, hash);
217258
if (e != NULL) {
218-
e->refs++;
219-
LRU_Remove(e);
220-
LRU_Append(e);
259+
Ref(e);
221260
}
222261
return reinterpret_cast<Cache::Handle*>(e);
223262
}
@@ -239,46 +278,57 @@ Cache::Handle* LRUCache::Insert(
239278
e->charge = charge;
240279
e->key_length = key.size();
241280
e->hash = hash;
242-
e->refs = 2; // One from LRUCache, one for the returned handle
281+
e->in_cache = false;
282+
e->refs = 1; // for the returned handle.
243283
memcpy(e->key_data, key.data(), key.size());
244-
LRU_Append(e);
245-
usage_ += charge;
246284

247-
LRUHandle* old = table_.Insert(e);
248-
if (old != NULL) {
249-
LRU_Remove(old);
250-
Unref(old);
251-
}
285+
if (capacity_ > 0) {
286+
e->refs++; // for the cache's reference.
287+
e->in_cache = true;
288+
LRU_Append(&in_use_, e);
289+
usage_ += charge;
290+
FinishErase(table_.Insert(e));
291+
} // else don't cache. (Tests use capacity_==0 to turn off caching.)
252292

253293
while (usage_ > capacity_ && lru_.next != &lru_) {
254294
LRUHandle* old = lru_.next;
255-
LRU_Remove(old);
256-
table_.Remove(old->key(), old->hash);
257-
Unref(old);
295+
assert(old->refs == 1);
296+
bool erased = FinishErase(table_.Remove(old->key(), old->hash));
297+
if (!erased) { // to avoid unused variable when compiled NDEBUG
298+
assert(erased);
299+
}
258300
}
259301

260302
return reinterpret_cast<Cache::Handle*>(e);
261303
}
262304

263-
void LRUCache::Erase(const Slice& key, uint32_t hash) {
264-
MutexLock l(&mutex_);
265-
LRUHandle* e = table_.Remove(key, hash);
305+
// If e != NULL, finish removing *e from the cache; it has already been removed
306+
// from the hash table. Return whether e != NULL. Requires mutex_ held.
307+
bool LRUCache::FinishErase(LRUHandle* e) {
266308
if (e != NULL) {
309+
assert(e->in_cache);
267310
LRU_Remove(e);
311+
e->in_cache = false;
312+
usage_ -= e->charge;
268313
Unref(e);
269314
}
315+
return e != NULL;
316+
}
317+
318+
void LRUCache::Erase(const Slice& key, uint32_t hash) {
319+
MutexLock l(&mutex_);
320+
FinishErase(table_.Remove(key, hash));
270321
}
271322

272323
void LRUCache::Prune() {
273324
MutexLock l(&mutex_);
274-
for (LRUHandle* e = lru_.next; e != &lru_; ) {
275-
LRUHandle* next = e->next;
276-
if (e->refs == 1) {
277-
table_.Remove(e->key(), e->hash);
278-
LRU_Remove(e);
279-
Unref(e);
325+
while (lru_.next != &lru_) {
326+
LRUHandle* e = lru_.next;
327+
assert(e->refs == 1);
328+
bool erased = FinishErase(table_.Remove(e->key(), e->hash));
329+
if (!erased) { // to avoid unused variable when compiled NDEBUG
330+
assert(erased);
280331
}
281-
e = next;
282332
}
283333
}
284334

util/cache_test.cc

+28-1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,11 @@ class CacheTest {
5959
&CacheTest::Deleter));
6060
}
6161

62+
Cache::Handle* InsertAndReturnHandle(int key, int value, int charge = 1) {
63+
return cache_->Insert(EncodeKey(key), EncodeValue(value), charge,
64+
&CacheTest::Deleter);
65+
}
66+
6267
void Erase(int key) {
6368
cache_->Erase(EncodeKey(key));
6469
}
@@ -135,15 +140,37 @@ TEST(CacheTest, EntriesArePinned) {
135140
TEST(CacheTest, EvictionPolicy) {
136141
Insert(100, 101);
137142
Insert(200, 201);
143+
Insert(300, 301);
144+
Cache::Handle* h = cache_->Lookup(EncodeKey(300));
138145

139-
// Frequently used entry must be kept around
146+
// Frequently used entry must be kept around,
147+
// as must things that are still in use.
140148
for (int i = 0; i < kCacheSize + 100; i++) {
141149
Insert(1000+i, 2000+i);
142150
ASSERT_EQ(2000+i, Lookup(1000+i));
143151
ASSERT_EQ(101, Lookup(100));
144152
}
145153
ASSERT_EQ(101, Lookup(100));
146154
ASSERT_EQ(-1, Lookup(200));
155+
ASSERT_EQ(301, Lookup(300));
156+
cache_->Release(h);
157+
}
158+
159+
TEST(CacheTest, UseExceedsCacheSize) {
160+
// Overfill the cache, keeping handles on all inserted entries.
161+
std::vector<Cache::Handle*> h;
162+
for (int i = 0; i < kCacheSize + 100; i++) {
163+
h.push_back(InsertAndReturnHandle(1000+i, 2000+i));
164+
}
165+
166+
// Check that all the entries can be found in the cache.
167+
for (int i = 0; i < h.size(); i++) {
168+
ASSERT_EQ(2000+i, Lookup(1000+i));
169+
}
170+
171+
for (int i = 0; i < h.size(); i++) {
172+
cache_->Release(h[i]);
173+
}
147174
}
148175

149176
TEST(CacheTest, HeavyEntries) {

0 commit comments

Comments
 (0)