Support to specify a disk quota for intermediate files #446

shuijing198799 · 2020-11-04T11:33:56Z

Feature Request

Is your feature request related to a problem? Please describe:

The lightning need a volume to save intermediate file, It’s hard to predict the size of this disk， So we must prepare as large a disk as possible to install these temporary files, For example , If I need to restore 2T data, we have to prepare a 2T volume for lightning, It is a bad experience to use on the cloud。

Describe the feature you'd like:

We want to specify the volume size and the checkpoint will exceed this size in the lightning process.

Describe alternatives you've considered:

none

Teachability, Documentation, Adoption, Optimization:

kennytm · 2020-11-04T19:09:04Z

by "check point file" you mean those "SST files" in the local backend?

shuijing198799 · 2020-11-05T04:58:28Z

by "check point file" you mean those "SST files" in the local backend?

yes。 I use immediate files instead.

kennytm · 2020-11-06T11:16:26Z

Seems we can use https://pkg.go.dev/github.com/cockroachdb/pebble#DB.EstimateDiskUsage to fetch the disk usage.

Abstract

Periodically, before a WriteRows, check every engine's total estimated disk usage. If the total disk usage exceeds the "(soft) disk quota", we block the write to the largest engines until the remaining total is less than the quota, and flush the blocked engines' content into TiKV. The engine UUIDs are reused.

This will cause subsequent imports to suffer from range overlapping, which we have to accept as trade-off.

Checkpoint validity

The flushing design must be compatible with checkpoints, that is no data will be lost if we Ctrl+C → resume in the middle of a process. Checkpoints may be earlier than the actual progress, so some data (process) duplication should be acceptable and ignored.

Now let's consider the flush process:

... parallel WriteRows ...
detected quota overflow, start emergency ingest to TiKV
CloseEngine()
1. Flush()
2. saveEngineMeta()
ImportEngine()
1. readAndSplitIntoRange()
2. loop:
  - SplitAndScatterRegionByRanges()
  - WriteAndIngestByRanges()
Reset engine to empty
... parallel WriteRows ...

Let's consider what happens regarding the place of interruption (I) and actual saved checkpoint (C):

Case I=3, C<3

Currently, with Local backend, a checkpoint is flushed only when the entire engine is written because Flush() is expensive (#326 (comment)). So the end of step 3 is a good point to save the checkpoint.

If step 3's checkpoint is not recorded, we will restart from the beginning, while the engine contained some incomplete data. This makes us to hit step 2 quicker, and some "future" data will be ingested. But this is still fine since those duplicated KV in the future are ignored.

Case I=4, C<4

If step 4 is actually completed, all data will have been copied to TiKV. So whether C=1 (restart from scratch) or C=3 (import again) should be fine in terms of data, just slower.

Case I=5, C<5

If step 5 is actually completed, the local data is cleaned up. Starting from C=1 should be fine. Starting from C=3 or C=4 will lead to importing an empty database, which is also fine because the data are already sent to TiKV.

Considering these, it should be fine to place a checkpoint immediately before flushing, importing and resetting the engine.

Implementation

Every engine provides a StorageSize() uint64 method. TiDB and Importer backend implement that by returning 0, Local backend implement that by calculating total occupied size.
Periodically (how?), compute the StorageSize() for every engine, and sort the result in ascending order. At the point the total storage size exceeds the "quota", mark those engines for flushing.
- The "Period" depends on how expensive it is to compute StorageSize().
For every engine marked for flush,
- Acquire a write lock from the engine's "flush" RWMutex.
- Do the flush + ingest + clean, writing checkpoint in between
- If the engine is a data-engine, perform a Flush() on the corresponding index-engine too.
- Release the write lock
For every deliveryLoop,
- Before WriteRows(), try to acquire a read lock from the engine's "flush" RWMutex.
- if the read lock is immediately acquired, do WriteRows() as usual, and continue.
- otherwise, do the actual read lock acquisition.
- after the read lock is acquired, immediately write the current file offset to the checkpoint.
- do WriteRows() as usual.

overvenus · 2020-11-06T12:13:42Z

StorageSize() seems to be fast if we calculate the size of the full range.

Also, I suggest we should maintain an approximate size which can be last calculated storage size + written bytes. The "written bytes" means the size of bytes we have written to DB since the last storage size calculation. By this way, we can avoid overwriting accidentally.

kennytm · 2020-11-07T09:03:43Z

By this way, we can avoid overwriting accidentally.

could you elaborate how this works?

shuijing198799 added the feature-request This issue is a feature request label Nov 4, 2020

overvenus changed the title ~~Hope to specify a disk size to check point~~ Support to specify a disk quote for intermediate files Nov 4, 2020

glorv mentioned this issue Nov 9, 2020

Ingest failed due to EpochNotMatch error #436

Open

kennytm mentioned this issue Nov 10, 2020

lightning stuck when disk is full #463

Open

kennytm changed the title ~~Support to specify a disk quote for intermediate files~~ Support to specify a disk quota for intermediate files Nov 16, 2020

kennytm mentioned this issue Nov 26, 2020

backend: implement disk quota #493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to specify a disk quota for intermediate files #446

Support to specify a disk quota for intermediate files #446

shuijing198799 commented Nov 4, 2020 •

edited by overvenus

Loading

kennytm commented Nov 4, 2020 •

edited

Loading

shuijing198799 commented Nov 5, 2020 •

edited

Loading

kennytm commented Nov 6, 2020

overvenus commented Nov 6, 2020

kennytm commented Nov 7, 2020

Support to specify a disk quota for intermediate files #446

Support to specify a disk quota for intermediate files #446

Comments

shuijing198799 commented Nov 4, 2020 • edited by overvenus Loading

Feature Request

kennytm commented Nov 4, 2020 • edited Loading

shuijing198799 commented Nov 5, 2020 • edited Loading

kennytm commented Nov 6, 2020

Abstract

Checkpoint validity

Case I=3, C<3

Case I=4, C<4

Case I=5, C<5

Implementation

overvenus commented Nov 6, 2020

kennytm commented Nov 7, 2020

shuijing198799 commented Nov 4, 2020 •

edited by overvenus

Loading

kennytm commented Nov 4, 2020 •

edited

Loading

shuijing198799 commented Nov 5, 2020 •

edited

Loading