Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of small sst files generated in delete after put and frequent checkpoint workload #9540

Closed
Shenjiaqi opened this issue Feb 10, 2022 · 4 comments
Labels
question waiting Waiting for a response from the issue creator.

Comments

@Shenjiaqi
Copy link

Actual behavior

If all keys are put and deleted in ascending order, and checkpoint is triggered frequently, Many small sst files is generated.
It seems that these sst files is compacted(trivial moved) from level-0:

  • Most sst files are in level-1.
  • All records in these sst files are kTypeDeletion.
  • Logs such as "Moving #${sst file id} to level-1 ${serveral KB} bytes" can be found in LOG.

These small sst files seems never selected for compact and finally cause Too many open files error.

Code to reproduce the behavior

//
//  main.cpp
//  reproduce
//
//  Created by shenjiaqi on 2022/2/8.
//

#include <iostream>
#include <filesystem>
#include <cstdio>
#include <cstdlib>
#include <string>

#include "rocksdb/utilities/checkpoint.h"
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"

using namespace ROCKSDB_NAMESPACE;
using ROCKSDB_NAMESPACE::DB;
using ROCKSDB_NAMESPACE::Options;
using ROCKSDB_NAMESPACE::PinnableSlice;
using ROCKSDB_NAMESPACE::ReadOptions;
using ROCKSDB_NAMESPACE::Status;
using ROCKSDB_NAMESPACE::WriteBatch;
using ROCKSDB_NAMESPACE::WriteOptions;

std::string kDBPath = "/Users/shenjiaqi/Workspace/rocksdb/data-test"; // need to be reconfigured

static void createCheckpoint(rocksdb::DB *db, rocksdb::Status &s) {
    std::cout << "create checkpoint" << std::endl;
    std::string chkPath = kDBPath + "-chp";
    assert(chkPath.find("/Users/shenjiaqi/Workspace/rocksdb/data-test") >= 0); // just in case
    system(("rm -rf " + chkPath).data()); // use with care.
    
    Checkpoint* checkpoint_ptr;
    s = Checkpoint::Create(db, &checkpoint_ptr);
    assert(s.ok());
    
    s = checkpoint_ptr->CreateCheckpoint(chkPath);
    assert(s.ok());
}


int main() {
    DB* db;
    Options options;
    options.IncreaseParallelism();
    options.OptimizeLevelStyleCompaction();
    options.create_if_missing = true;
    options.info_log_level = DEBUG_LEVEL;

    // open DB
    Status s = DB::Open(options, kDBPath, &db);
    assert(s.ok());

    for (int i = 0; i < 1000; ++i) {

        std::string key = "key" + /* std::to_string((int)rand()); // */std::to_string(i);
        std::string value = "value" + std::to_string(i);

        // Put key-value
        s = db->Put(WriteOptions(), key, value);
        assert(s.ok());

        // delete after put
        s = db->Delete(WriteOptions(), key);
        assert(s.ok());

        if (i > 0 && (i % 5) == 0) {
            // each checkpoint will trigger dump level 0 sst file, which contains only delete tags.
            // These sst files will be compacted(trivial moved) to level 1.
            createCheckpoint(db, s);
        }
    }
    
    createCheckpoint(db, s);
    return 0;
}
@akankshamahajan15
Copy link
Contributor

Can you try Manual compaction to force the compaction on the bottommost level. https://github.com/facebook/rocksdb/wiki/Compaction-Trivial-Move

@Shenjiaqi
Copy link
Author

@akankshamahajan15
I mantain a long running job using rocksdb. Should I create a thread checking number of sst files and compact them periodlly?

@akankshamahajan15
Copy link
Contributor

@Shenjiaqi Yes, I think manual compaction should help in your case. Let me know if that doesn't work.

@ajkr ajkr added the question label Feb 15, 2022
@ajkr
Copy link
Contributor

ajkr commented Feb 15, 2022

You can also try increasing log_size_for_flush to avoid flushing of small files:

uint64_t log_size_for_flush = 0,

Make sure to read the API doc carefully -- it can be dangerous if you set WriteOptions::disableWAL.

Note also WALs are copied not hard-linked, so multiple checkpoints containing the same WAL will duplicate data.

@ajkr ajkr added the waiting Waiting for a response from the issue creator. label Feb 15, 2022
@ajkr ajkr closed this as completed Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question waiting Waiting for a response from the issue creator.
Projects
None yet
Development

No branches or pull requests

3 participants