dragonflyoss · changweige · Dec 6, 2021 · Dec 6, 2021
@@ -2,11 +2,11 @@
 
 # I. High Level Design
 ##    0. Overview
-Dragonfly image service is named as `nydus`, [github repo](https://github.com/dragonflyoss/image-service)
+Dragonfly image service is named as `nydus`, [GitHub repo](https://github.com/dragonflyoss/image-service)
 
 Nydus consists of two parts,
 * a userspace filesystem called `rafs` on top of a container image format
-* an image manifest that is compatible with oci spec of image and distribution
+* an image manifest that is compatible with OCI spec of image and distribution
 
 Its key features include:
 
@@ -26,7 +26,7 @@ Nydus takes in either [FUSE](https://www.kernel.org/doc/html/latest/filesystems/
 ![architecture](images/nydusd-arch.png)
 
 ##    2. Rafs
-Rafs presents to users a userspace filesystem with seperating filesystem's metadata with data.  In a typical rafs filesystem, the metadata is stored in `bootstrap` while the data is stored in `blobfile`. Nydus splits container image into two parts, metadata and data, where metadata contains everything a container needs to start with, while data is stored in chunk with chunk size being 1MB.
+Rafs presents to users a userspace filesystem with separating filesystem's metadata with data.  In a typical rafs filesystem, the metadata is stored in `bootstrap` while the data is stored in `blobfile`. Nydus splits container image into two parts, metadata and data, where metadata contains everything a container needs to start with, while data is stored in chunks with chunk size being 1MB.
 
 ![rafs](./images/rafs-format.png)
 
@@ -79,7 +79,7 @@ Validation of the metadata takes place at runtime when metadata is accessed.  By
 
 The read verification is doing sanity checking on metadata's fields and determining whether digest validating is necessary.  If it is, the digest is calculated with the chosen hash algorithm and compared against the value stored in the object itself.  If any of these checks fail, then the buffer is considered corrupt and the EINVAL error is set appropriately.
 ### 3.2  Data Integrity Validation
-Data is split into chunks and each chunk has a saved digest in chunk info, the way of metadata digest validation applies to chunk as well.
+Data is split into chunks and each chunk has a saved digest in chunk info, the way of metadata digest validation applies to chunks as well.
 
 ##    4. Prefetch
 As a lazily fetch solution, prefetch plays an important role to mitigate the impact of failing to fetch data after containers run.   In order to do it, we need to record hints in container image about which files and directories need prefetching, according to the information, at runtime nydus daemon will fetch these files and directories in the background into local storage.
@@ -96,7 +96,8 @@ Nydus can be configured to set up a cache for blob, called `blobcache`.  With `b
 Nydus can be configured to save either compressed chunk or noncompressed chunk, with compressed chunk is the default configuration.
 
 The compression algorithm is lz4 and gzip, `None` stands for noncompression.
-```
+
+```rust
 pub enum Algorithm {
     None,
     LZ4Block,
@@ -107,8 +108,9 @@ pub enum Algorithm {
 # II. Global Structures
    ## 1. Rafs Superblock
    Rafs superblock is located at the first 8K of  the `bootstrap` file.
-   ```
-   pub struct OndiskSuperBlock {
+
+```rust
+pub struct OndiskSuperBlock {
     /// RAFS super magic
     s_magic: u32,
     /// RAFS version
@@ -142,7 +144,8 @@ pub enum Algorithm {
 ```
 
 `s_flags` offers several flags to choose which compression algorithm, metadata hash algorithm and xattr will be used.
-```
+
+```rust
 bitflags! {
     pub struct RafsSuperFlags: u64 {
         /// Data chunks are not compressed.
@@ -169,24 +172,23 @@ bitflags! {
 
    ## 2. Rafs Inode
 
-   ```
-   pub struct OndiskInodeWrapper<'a> {
+```rust
+pub struct OndiskInodeWrapper<'a> {
     pub name: &'a OsStr,
     pub symlink: Option<&'a OsStr>,
     pub inode: &'a OndiskInode,
 }
-   ```
+```
 
 The OndiskInode struct size is padded to 128 bytes.
 
-* If it's a directory, all its children is indexed contiguously in `inode table`, and `i_child_index` is the index of  the first child and `i_child_count` is the amount of its children.
+* If it's a directory, all its children are indexed contiguously in `inode table`, and `i_child_index` is the index of the first child and `i_child_count` is the amount of its children.
 * If it's a file, `i_child_index` is not used.
 *`i_name_size` is the length of its name.
 * `i_symlink_size` is the length of its symlink path.
 
-
-```
-   pub struct OndiskInode {
+```rust
+pub struct OndiskInode {
     /// sha256(sha256(chunk) + ...), [char; RAFS_SHA256_LENGTH]
     pub i_digest: RafsDigest, // 32
     /// parent inode number
@@ -213,12 +215,12 @@ The OndiskInode struct size is padded to 128 bytes.
     pub i_symlink_size: u16, // 104
     pub i_reserved: [u8; 24], // 128
 }
-   ```
+```
 
 `i_flags` indicates whether the inode is a symlink or a hardlink, whether it has xattr, and whether it has hole between its chunks.
 
-  ```
-  bitflags! {
+```rust
+bitflags! {
     pub struct RafsInodeFlags: u64 {
         /// Inode is a symlink.
         const SYMLINK = 0x0000_0001;
@@ -228,15 +230,18 @@ The OndiskInode struct size is padded to 128 bytes.
         const XATTR = 0x0000_0004;
         /// Inode chunks has holes.
         const HAS_HOLE = 0x0000_0008;
-   }
+    }
 }
-  ```
-  `OndiskXAttrs` and xattr are stored right after `OndiskInodeWrapper` in the boostrap file.
- ```
- pub struct OndiskXAttrs {
+```
+
+  `OndiskXAttrs` and xattr are stored right after `OndiskInodeWrapper` in the bootstrap file.
+
+```rust
+pub struct OndiskXAttrs {
     pub size: u64,
 }
- ```
+```
+
 A list of `OndiskChunkInfo` is also stored after xattr if the inode contains file data.  Each chunk info tells us where to find data in blob file, it contains 
 - the hash value `block_id` calculated from the chunk data,
 - the blob file it belongs to,
@@ -246,8 +251,8 @@ A list of `OndiskChunkInfo` is also stored after xattr if the inode contains fil
 - the file offset.
 
 
-```
-  pub struct OndiskChunkInfo {
+```rust
+pub struct OndiskChunkInfo {
     /// sha256(chunk), [char; RAFS_SHA256_LENGTH]
     pub block_id: RafsDigest,
     /// blob index (blob_id = blob_table[blob_index])
@@ -277,26 +282,28 @@ bitflags! {
         const HOLECHUNK = 0x0000_0002;
     }
 }
-  ```
-
+```
    ## 3. Rafs Inode Table
 Inode table is a mapping from inode index to `OndiskInode`, specifically a hardlink file shares the same inode number but has a different inode index.
-```
-   pub struct OndiskInodeTable {
+
+```rust
+pub struct OndiskInodeTable {
     pub(crate) data: Vec<u32>,
 }
-   ```
+```
    ## 4. Rafs Prefetch Table
   This is where we record hints in container image about which files and directories need prefetching upon starting.
-```
+
+```rust
 pub struct PrefetchTable {
     pub inode_indexes: Vec<u32>,
 }
-   ```
+```
    ## 5. Rafs Blob Table
    Blob table is the mapping from blob index of `OndiskInode` to blob id so that we don't have to record blob id inside `OndiskInode` (note that different inodes' data chunk can reside in the same blob).
-```
-   pub struct OndiskBlobTableEntry {
+
+```rust
+pub struct OndiskBlobTableEntry {
     pub readahead_offset: u32,
     pub readahead_size: u32,
     pub blob_id: String,
@@ -305,13 +312,14 @@ pub struct PrefetchTable {
 pub struct OndiskBlobTable {
     pub entries: Vec<OndiskBlobTableEntry>,
 }
-   ```
+```
 # III. Manifest of Nydus Format Image
 Nydus manifest is designed to be fully compatible with OCI image spec and distribution spec by adding an extra manifest file to store the pointers of nydus bootstrap (i.e. metadata) and blobfile (i.e. data).
 
   ## 1. Image Index
-  A typical image index enabling nydus points to two manifest files, one is the traditional OCIv1 image manifest, the other is the nydus manifest that takes advantage of `platform` and puts `os.features: ["nydus.remoteimage.v1"]` field under `platform`.
-  ```
+  A typical image index enabling nydus points to two manifest files, one is the traditional OCI v1 image manifest, the other is the nydus manifest that takes advantage of `platform` and puts `os.features: ["nydus.remoteimage.v1"]` field under `platform`.
+
+```json
   {
   "schemaVersion": 2,
   "manifests": [
@@ -338,7 +346,7 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
     }
   ]
 }
-  ```
+```
   ## 2. Image Manifest
   A typical image manifest of nydus consists of `config.json`, one nydus metadata layer (`"mediaType": "application/vnd.oci.image.layer.v1.tar.gz"`) and one or more nydus data layers (`"mediaType": "application/vnd.oci.image.layer.nydus.blob.v1"`).
   * nydus metadata layer
@@ -348,7 +356,8 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
   This layer refers to the data part, please note that the data layers of an image can be owned solely by this image or shared by others, similarly, each data layer is annotated with `"containerd.io/snapshot/nydus-blob": "true"`, which can be used to tell containerd's snapshotter to skip downloading them.
 
   The manifest is designed to be compatible with the dependency architect and garbage collection algorithm widely used by containerd and registry.
-  ```
+
+```json
   {
   "schemaVersion": 2,
   "mediaType": "",
@@ -392,4 +401,4 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
     }
   ]
 }
-  ```
+```
@@ -24,7 +24,7 @@ Nydus-image tool writes data portion into a file which is generally called `blob
 
 - With `--blob <BLOB_FILE>` option, nydus-image tool will write blob contents into the custom file path `BLOB_FILE`
 
-- With `--blob-dir BLOB_DIR` provided to command, nydus-image tool creates the blob file named as its sha-256 digest. This is useful when you don't want to set a custom name or you are building a layered nydus image. Please create the `BLOB_DIR` before perform the command.
+- With `--blob-dir BLOB_DIR` provided to command, nydus-image tool creates the blob file named as its sha-256 digest. This is useful when you don't want to set a custom name or you are building a layered nydus image. Please create the `BLOB_DIR` before performing the command.
 
 Generally, this is regular file which blob content will be dumped into. It can also be a fifo(named pipe) from which nydusify or other tool can receive blob content.
 
@@ -77,4 +77,4 @@ nydus-image create \
   /path/to/stargz.index.upper.json
 ```
 
-Note: the argument value of image layer id specified in nydus-image CLI should omit `sha256:` prefix.
+**Note**: the argument value of image layer id specified in nydus-image CLI should omit `sha256:` prefix.
@@ -2,8 +2,9 @@
 
 `nydus-overlayfs` is a FUSE(Filesystem in UserSpacE) mount helper command for containerd to use with Nydus. The document explains in a nutshell on how it works.
 
-When the `--enable-nydus-overlayfs` option is specified, `nydus-snapshotter` `Mount()` method returns a mount slice like
-```shell
+When the `--enable-nydus-overlayfs` option is specified, `nydus-snapshotter`'s `Mount()` method returns a mount slice like
+
+```json
 [
     {
         Type: "fuse.nydus-overlayfs",
@@ -35,6 +36,6 @@ And `nydus-overlayfs` parses the mount options, filters out `extraoption`, and c
 mount -t overlay overlay ./foo/merged -o lowerdir=./foo/lower2:./foo/lower1,upperdir=./foo/upper,workdir=./foo/work,dev,suid
 ```
 
-Meanwhile, when ncontainerd passes the `nydus-snapshotter` mount slice to `containerd-shim-kata-v2`, it can parse the mount slice and pass the `extraoption` to `nydusd`, to support nydus image format natively.
+Meanwhile, when containerd passes the `nydus-snapshotter` mount slice to `containerd-shim-kata-v2`, it can parse the mount slice and pass the `extraoption` to `nydusd`, to support nydus image format natively.
 
 So in summary, `containerd` and `containerd-shim-runc-v2` rely on the `nydus-overlay` mount helper to handle the mount slice returned by `nydus-snapshotter`, while `containerd-shim-kata-v2` can parse and handle it on its own.
@@ -18,7 +18,7 @@ With option `prefetch-policy`, `nydus-image` tries to read stdin to gather a lis
 
 Note that, `fs_prefetch` has to be enabled in rafs configuration file if prefetch is required.
 
-### 1 File System Level
+### 1. File System Level
 
 Nydus issues prefetch requests to backend and pulls needed chunks to local storage. So read IO can hit the blobcache which was previously filled by prefetch. Speaking of file system level prefetch, the prefetch request is issued from Rafs layer. So it is easier to better understand about files layout on disk, the relationship between files and directories. Prefetch works on top of file system is born of agility and very nimble.
 
@@ -37,7 +37,7 @@ Prefetch is configurable by Rafs configuration file.
   In unit of bytes.
   In order to mitigate possible backend bandwidth contention, we can give a bandwidth ratelimit to prefetch. Note that the `bandwidth_rate` sets the limit to the aggregated backend bandwidth consumed by all the threads configured by `threads_count`. So with a lower `bandwidth_rate` limit, more prefetch threads might be meaningless.
 
-A rafs configuration file (only $.fs_prefetch shows, other properties are omitted) follows:
+A rafs configuration file (only `$.fs_prefetch` shows, other properties are omitted) follows:
 
 ```json
 {
@@ -67,7 +67,7 @@ Nydus can now only prefetch data from backend by an explicit hint either from pr
 - User IO triggered, block-level readahead.
 - Prefetch the parent directory if one of its child is read.
 
-### 2 Blob Level
+### 2. Blob Level
 
 Not like file system level prefetch, blob level prefetch directly pre-fetches a contiguous region from blob when nydusd started. This prefetch procedure is not aware of files, directories and chunks structures. When creating nydus image, a range descriptor composed of `readahead_offset` and `readahead_length` is written bootstrap. But blob level prefetch **won't** cache any data into blobcache or any other kind of cache. It works at `StorageBackend` level which is lower than `RafsCache`. For now, blob level prefetch only benefits `LocalFs` specific backend. In particular, `LocalFs` backend can perform Linux system call `readahead(2)` to load data from `readahead_offset` up to `readahead_length` bytes.