Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store snapshot of manifest to object store #579

Closed
ShiKaiWi opened this issue Jan 17, 2023 · 3 comments · Fixed by #607
Closed

Store snapshot of manifest to object store #579

ShiKaiWi opened this issue Jan 17, 2023 · 3 comments · Fixed by #607
Assignees
Labels
breaking-change Contains user-facing changes feature New feature or request

Comments

@ShiKaiWi
Copy link
Member

ShiKaiWi commented Jan 17, 2023

Describe This Problem

Currently, the snapshot and normal updates of manifest are appended to WAL, which leads to such problems:

  • Complexity of processing the concurrent writing to WAL of snapshotting and normal updating;
  • One giant snapshot may exceed the limit of the length of one WAL entry;

Proposal

The basic idea of the proposal is simple: store the snapshot to the object store.

Additional Context

This is a breaking change #402.

@ShiKaiWi ShiKaiWi added feature New feature or request breaking-change Contains user-facing changes labels Jan 17, 2023
@ShiKaiWi ShiKaiWi self-assigned this Jan 17, 2023
@Rachelint Rachelint mentioned this issue Jan 28, 2023
2 tasks
@ShiKaiWi
Copy link
Member Author

ShiKaiWi commented Feb 2, 2023

Overview

Here is a diagram to describe the new storage of manifest combined with WAL and Object Storage

                      ┌─────────────────────────────────────────────────────────────────┐ 
                      │                                                                 │ 
                      │    ┌────────┐                                                   │ 
                      │    │    3   │  /manifest/snapshot/{space_id}/{table_id}/current │ 
┌──────┐              │    └────────┘                                                   │ 
│  4   │              │         │                                                       │ 
├──────┘              │         │                                                       │ 
   3   │─────────┐    │         ▼                                                       │ 
├ ─ ─ ─          │    │    ┌────────┐                                                   │ 
   2   │─────────┼────┼───▶│Snapshot│     /manifest/snapshot/{space_id}/{table_id}/3    │ 
├ ─ ─ ─          │    │    └────────┘                                                   │ 
   1   │─────────┘    │                                                                 │ 
└ ─ ─ ─               └─────────────────────────────────────────────────────────────────┘ 
                                                                                          
  WAL                                          Object Storage                             
                                                                                          

Update

Just insert the new updates into the WAL.

Snapshot

  • Generate the latest snapshot with previous snapshot if any and updates after that snapshot;
  • Store the latest snapshot to as the /manifest/snapshot/{space_id}/{table_id}/{end_seq} into Object Storage;
  • Overwrite the /manifest/snapshot/{space_id}/{table_id}/current to map the current snapshot to the path of the latest snapshot;
  • Delete the logs included by the latest snapshot and previous snapshot in the Object Storage, and it doesn't matter if any error ocurr in this stage.

Recover

  • Fetch the /manifest/snapshot/{space_id}/{table_id}/current file from Object Storage for the current snapshot file path;
  • Fetch the current snapshot;
  • Read the new logs after the current snapshot and combine the both two parts to generate the integrate meta data of the table;

@jiacai2050
Copy link
Contributor

Overwrite the /manifest/snapshot/{space_id}/{table_id}/current to map the current snapshot to the path of the latest snapshot;

How will you ensure this operation is atomic?

@ShiKaiWi
Copy link
Member Author

ShiKaiWi commented Feb 3, 2023

Overwrite the /manifest/snapshot/{space_id}/{table_id}/current to map the current snapshot to the path of the latest snapshot;

How will you ensure this operation is atomic?

I guess the ObjectStore trait should ensure that operation to be atomic.

Currently, for the real object storage service, e.g. AWS3, Aliyun OSS, the consistency model guarantees the put/update is atomic. As for the current "object store" based local file system in CeresDB, the atomic put/update is actually not ensured yet, but I guess it should provide such guarantee to meet the requirements by the ObjectStore trait.

Reference: S3 consistency model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Contains user-facing changes feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants