Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Virtual file system #1184

Closed
1 task done
JinHai-CN opened this issue May 7, 2024 · 5 comments
Closed
1 task done

[Feature Request]: Virtual file system #1184

JinHai-CN opened this issue May 7, 2024 · 5 comments
Assignees
Labels
feature request New feature or request

Comments

@JinHai-CN
Copy link
Contributor

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

Infinity's internal data is consists of segments and blocks, where each block is made up of a bunch of block columns. The implementation is such that each block column is persisted as a file on disk, no matter how large that file is. This can result in a large number of files in a single table. This feature request aims to solve this problem. We use a virtual filesystem serving infinity, whereas in reality several block column files exist on a single actual file, which avoids the problem of creating a large number of files and alleviates the possibility of a 'too many open' files error.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

@JinHai-CN JinHai-CN added the feature request New feature or request label May 7, 2024
@JinHai-CN JinHai-CN mentioned this issue May 7, 2024
79 tasks
@JinHai-CN
Copy link
Contributor Author

The goal of the virtual file system is to have a virtual layer where each generated block column, index file, delete file, etc. can be stored by the VFS. Through this layer, infinity can be connected to the local file system, can also be connected to the file system like s3.

Therefore, virtual file system needs to provide the following interfaces:
Open/Read/Write/Seek/Truncate/Close.

In the concrete implementation, VFS needs a metadata store: provide the mapping relationship between physical files and virtual file blocks, also provide the virtual file data contained in which virtual file blocks. For metadata reading and writing, what we see now is mainly accessed in the form of key value. Therefore, metadata storage can be considered kv store.

The size of each file block should be a fixed size, for example, 64KB. A physical file, isn't a fixed size files. But its size should be fixed in, for example, between 16 and 24MB.

With the constant creation and deletion of files, there must be a large amount of file fragments in the original file that needs to be cleaned up. Considering that s3 will be used as the actual storage, this layer of virtual file system, for the use of physical storage, should be append-only. The fragments merging and cleanup operation logs should be kept by the WAL of the database like create/delete/update/write operations of the VFS.

@JinHai-CN
Copy link
Contributor Author

JinHai-CN commented May 16, 2024

Considering the complexity of this feature and the main goal of 0.2.0, we decided to move this feature out of the 0.2.0 scope, now.

@yuzhichang
Copy link
Member

yuzhichang commented Jul 11, 2024

Objective

The main idea of the Feature Request is

  • Package some small files with coupled lifecycle into a middle-sized(~200MB uncompressed) one.
  • Keep big files as they are.

It is not

  • A generic distributed file system, such as Ceph and Cubefs, merge/split files into fix-sized blocks without taking care their lifecycle and have complete and complicated GC mechanism.

Summary

Small files of a segment have the coupled lifecycle:

  • BlockEntry version file
  • BlockColumnEntry file, HeapChunk

A coupled file consist of one or more of above files of a single SegmentEntry. Each small file never spread over multiple coupled files.

Other files will not be impacted:

  • All index files, such as fulltext dictionary file
  • Catalog files
  • DeltaOp files

A TableEntry maintains the file id set.
BlockEntry, BlockColumnEntry, ChunkIndexEntry maintain related file id, offset, size, created_ts, deleted_ts.

Details

insert

  • insert in memory -- no impaction

delete

  • delete in memory -- no impaction
  • delete a row of existing coupled file -- read, modify

full checkpoint

A full checkpoint is for the whole system.
Dump metadata json
Dump changed things to coupled files.

delta checkpoint

A delta checkpoint is for the whole system. Delta log file is as it's.

Segment Compaction

  • Determine segment list
  • Determine block list
  • Determine file list
  • Do compaction to generate new files and create_ts.

Drop table

Touch only metadata.

Drop index

Touch only metadata.

Create index

All index files, such as fulltext dictionary file

Build index

All index files, such as fulltext dictionary file.
Files dumped by MemoryIndexer::Dump() are lazily persisted to S3.

Cleanup

Iterate the file set, delete any one if its delete_ts is no larger than the given one.

@yuzhichang
Copy link
Member

yuzhichang commented Jul 11, 2024

PersistenceManager is newly introduced class:

  • PersistenceManager(const String &workspace, SizeT coupled_capacity, SizeT alone_capacity) Construct a PersistenceManager. coupled_capacity applies to the cache of coupled files(each object maps to one or more original file). alone_capacity applies to the cache of alone files(each object maps to exact one original file). Each cache use LRU kick-out mechanism regarding the capacity.
  • String CreateObj() generate an UUID as object_key of the new object.
  • int ObjRoom(const String &object_key) returns the room (capacity - sum_of_parts_size) of new object. User should check before each ObjAppend operation. capacity is const, for example 100MB.
  • void ObjAppend(const String &object_key, char *body, SizeT body_len) append body to the new object. User should have body compressed before each ObjAppend operation.
  • void ObjFinalize(const String &object_key) finalize a object. Subsequent ObjAppend on this object is forbidden. PersistenceManager shall upload the whole object to S3 in background.
  • Pair<String, SizeT, SizeT> Persist(const String &file_path) Append the content of given file to some open object, and returns the location.
  • Pair<UniquePtr<FileHandler>, Status> ObjOpen(const String &object_key) download the whole object from S3 if it's not in cache, open the cached object.

FileWorker gains following capability:

  • SetSource(const String & object_key, SizeT offset, SizeT len) Set the source object offset and length.
  • Tuple<String, SizeT, SizeT> GetDest() Get the dest object offset and length.

Every subclass of FileWork ReadFromFileImpl() shall call PersisteceManager::ObjOpen() to open given object, decompress and parse per its need.
Every subclass of FileWork WriteToFileImpl(bool to_spill, bool &prepare_success) shall write to a local file. If to_spill is false, then PersistenceManager::Persist(file_path) to persist to S3 and record the dest location so that later GetDest() can query.

BlockColumnEntrygains following capacity:

  • An entry may have multiple BufferObject (for varchar, sparse, tensor and tensor array), each BufferObject has a FileWork. So an entry know the persisted locations of the block column. Each Flush() operation changes these locations. We don't store the history of location since the full data visibility is inside the latest one.

@yuzhichang yuzhichang mentioned this issue Jul 16, 2024
1 task
yuzhichang added a commit that referenced this issue Jul 17, 2024
Introduced PersistenceManager.
Adapted FileWorker for PersistenceManager.

Issue link:#1184

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
@yuzhichang
Copy link
Member

yuzhichang commented Aug 19, 2024

Completed. VFS is disabled by default. Following piece in config file enable vfs:

[persistence]
persistence_dir          = "/var/infinity/persistence"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants