Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix some typos and grammatical problem #235

Merged
merged 1 commit into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 50 additions & 41 deletions docs/nydus-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

# I. High Level Design
## 0. Overview
Dragonfly image service is named as `nydus`, [github repo](https://github.com/dragonflyoss/image-service)
Dragonfly image service is named as `nydus`, [GitHub repo](https://github.com/dragonflyoss/image-service)

Nydus consists of two parts,
* a userspace filesystem called `rafs` on top of a container image format
* an image manifest that is compatible with oci spec of image and distribution
* an image manifest that is compatible with OCI spec of image and distribution

Its key features include:

Expand All @@ -26,7 +26,7 @@ Nydus takes in either [FUSE](https://www.kernel.org/doc/html/latest/filesystems/
![architecture](images/nydusd-arch.png)

## 2. Rafs
Rafs presents to users a userspace filesystem with seperating filesystem's metadata with data. In a typical rafs filesystem, the metadata is stored in `bootstrap` while the data is stored in `blobfile`. Nydus splits container image into two parts, metadata and data, where metadata contains everything a container needs to start with, while data is stored in chunk with chunk size being 1MB.
Rafs presents to users a userspace filesystem with separating filesystem's metadata with data. In a typical rafs filesystem, the metadata is stored in `bootstrap` while the data is stored in `blobfile`. Nydus splits container image into two parts, metadata and data, where metadata contains everything a container needs to start with, while data is stored in chunks with chunk size being 1MB.

![rafs](./images/rafs-format.png)

Expand Down Expand Up @@ -79,7 +79,7 @@ Validation of the metadata takes place at runtime when metadata is accessed. By

The read verification is doing sanity checking on metadata's fields and determining whether digest validating is necessary. If it is, the digest is calculated with the chosen hash algorithm and compared against the value stored in the object itself. If any of these checks fail, then the buffer is considered corrupt and the EINVAL error is set appropriately.
### 3.2 Data Integrity Validation
Data is split into chunks and each chunk has a saved digest in chunk info, the way of metadata digest validation applies to chunk as well.
Data is split into chunks and each chunk has a saved digest in chunk info, the way of metadata digest validation applies to chunks as well.

## 4. Prefetch
As a lazily fetch solution, prefetch plays an important role to mitigate the impact of failing to fetch data after containers run. In order to do it, we need to record hints in container image about which files and directories need prefetching, according to the information, at runtime nydus daemon will fetch these files and directories in the background into local storage.
Expand All @@ -96,7 +96,8 @@ Nydus can be configured to set up a cache for blob, called `blobcache`. With `b
Nydus can be configured to save either compressed chunk or noncompressed chunk, with compressed chunk is the default configuration.

The compression algorithm is lz4 and gzip, `None` stands for noncompression.
```

```rust
pub enum Algorithm {
None,
LZ4Block,
Expand All @@ -107,8 +108,9 @@ pub enum Algorithm {
# II. Global Structures
## 1. Rafs Superblock
Rafs superblock is located at the first 8K of the `bootstrap` file.
```
pub struct OndiskSuperBlock {

```rust
pub struct OndiskSuperBlock {
/// RAFS super magic
s_magic: u32,
/// RAFS version
Expand Down Expand Up @@ -142,7 +144,8 @@ pub enum Algorithm {
```

`s_flags` offers several flags to choose which compression algorithm, metadata hash algorithm and xattr will be used.
```

```rust
bitflags! {
pub struct RafsSuperFlags: u64 {
/// Data chunks are not compressed.
Expand All @@ -169,24 +172,23 @@ bitflags! {

## 2. Rafs Inode

```
pub struct OndiskInodeWrapper<'a> {
```rust
pub struct OndiskInodeWrapper<'a> {
pub name: &'a OsStr,
pub symlink: Option<&'a OsStr>,
pub inode: &'a OndiskInode,
}
```
```

The OndiskInode struct size is padded to 128 bytes.

* If it's a directory, all its children is indexed contiguously in `inode table`, and `i_child_index` is the index of the first child and `i_child_count` is the amount of its children.
* If it's a directory, all its children are indexed contiguously in `inode table`, and `i_child_index` is the index of the first child and `i_child_count` is the amount of its children.
* If it's a file, `i_child_index` is not used.
*`i_name_size` is the length of its name.
* `i_symlink_size` is the length of its symlink path.


```
pub struct OndiskInode {
```rust
pub struct OndiskInode {
/// sha256(sha256(chunk) + ...), [char; RAFS_SHA256_LENGTH]
pub i_digest: RafsDigest, // 32
/// parent inode number
Expand All @@ -213,12 +215,12 @@ The OndiskInode struct size is padded to 128 bytes.
pub i_symlink_size: u16, // 104
pub i_reserved: [u8; 24], // 128
}
```
```

`i_flags` indicates whether the inode is a symlink or a hardlink, whether it has xattr, and whether it has hole between its chunks.

```
bitflags! {
```rust
bitflags! {
pub struct RafsInodeFlags: u64 {
/// Inode is a symlink.
const SYMLINK = 0x0000_0001;
Expand All @@ -228,15 +230,18 @@ The OndiskInode struct size is padded to 128 bytes.
const XATTR = 0x0000_0004;
/// Inode chunks has holes.
const HAS_HOLE = 0x0000_0008;
}
}
}
```
`OndiskXAttrs` and xattr are stored right after `OndiskInodeWrapper` in the boostrap file.
```
pub struct OndiskXAttrs {
```

`OndiskXAttrs` and xattr are stored right after `OndiskInodeWrapper` in the bootstrap file.

```rust
pub struct OndiskXAttrs {
pub size: u64,
}
```
```

A list of `OndiskChunkInfo` is also stored after xattr if the inode contains file data. Each chunk info tells us where to find data in blob file, it contains
- the hash value `block_id` calculated from the chunk data,
- the blob file it belongs to,
Expand All @@ -246,8 +251,8 @@ A list of `OndiskChunkInfo` is also stored after xattr if the inode contains fil
- the file offset.


```
pub struct OndiskChunkInfo {
```rust
pub struct OndiskChunkInfo {
/// sha256(chunk), [char; RAFS_SHA256_LENGTH]
pub block_id: RafsDigest,
/// blob index (blob_id = blob_table[blob_index])
Expand Down Expand Up @@ -277,26 +282,28 @@ bitflags! {
const HOLECHUNK = 0x0000_0002;
}
}
```

```
## 3. Rafs Inode Table
Inode table is a mapping from inode index to `OndiskInode`, specifically a hardlink file shares the same inode number but has a different inode index.
```
pub struct OndiskInodeTable {

```rust
pub struct OndiskInodeTable {
pub(crate) data: Vec<u32>,
}
```
```
## 4. Rafs Prefetch Table
This is where we record hints in container image about which files and directories need prefetching upon starting.
```

```rust
pub struct PrefetchTable {
pub inode_indexes: Vec<u32>,
}
```
```
## 5. Rafs Blob Table
Blob table is the mapping from blob index of `OndiskInode` to blob id so that we don't have to record blob id inside `OndiskInode` (note that different inodes' data chunk can reside in the same blob).
```
pub struct OndiskBlobTableEntry {

```rust
pub struct OndiskBlobTableEntry {
pub readahead_offset: u32,
pub readahead_size: u32,
pub blob_id: String,
Expand All @@ -305,13 +312,14 @@ pub struct PrefetchTable {
pub struct OndiskBlobTable {
pub entries: Vec<OndiskBlobTableEntry>,
}
```
```
# III. Manifest of Nydus Format Image
Nydus manifest is designed to be fully compatible with OCI image spec and distribution spec by adding an extra manifest file to store the pointers of nydus bootstrap (i.e. metadata) and blobfile (i.e. data).

## 1. Image Index
A typical image index enabling nydus points to two manifest files, one is the traditional OCIv1 image manifest, the other is the nydus manifest that takes advantage of `platform` and puts `os.features: ["nydus.remoteimage.v1"]` field under `platform`.
```
A typical image index enabling nydus points to two manifest files, one is the traditional OCI v1 image manifest, the other is the nydus manifest that takes advantage of `platform` and puts `os.features: ["nydus.remoteimage.v1"]` field under `platform`.

```json
{
"schemaVersion": 2,
"manifests": [
Expand All @@ -338,7 +346,7 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
}
]
}
```
```
## 2. Image Manifest
A typical image manifest of nydus consists of `config.json`, one nydus metadata layer (`"mediaType": "application/vnd.oci.image.layer.v1.tar.gz"`) and one or more nydus data layers (`"mediaType": "application/vnd.oci.image.layer.nydus.blob.v1"`).
* nydus metadata layer
Expand All @@ -348,7 +356,8 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
This layer refers to the data part, please note that the data layers of an image can be owned solely by this image or shared by others, similarly, each data layer is annotated with `"containerd.io/snapshot/nydus-blob": "true"`, which can be used to tell containerd's snapshotter to skip downloading them.

The manifest is designed to be compatible with the dependency architect and garbage collection algorithm widely used by containerd and registry.
```

```json
{
"schemaVersion": 2,
"mediaType": "",
Expand Down Expand Up @@ -392,4 +401,4 @@ Nydus manifest is designed to be fully compatible with OCI image spec and distri
}
]
}
```
```
4 changes: 2 additions & 2 deletions docs/nydus-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Nydus-image tool writes data portion into a file which is generally called `blob

- With `--blob <BLOB_FILE>` option, nydus-image tool will write blob contents into the custom file path `BLOB_FILE`

- With `--blob-dir BLOB_DIR` provided to command, nydus-image tool creates the blob file named as its sha-256 digest. This is useful when you don't want to set a custom name or you are building a layered nydus image. Please create the `BLOB_DIR` before perform the command.
- With `--blob-dir BLOB_DIR` provided to command, nydus-image tool creates the blob file named as its sha-256 digest. This is useful when you don't want to set a custom name or you are building a layered nydus image. Please create the `BLOB_DIR` before performing the command.

Generally, this is regular file which blob content will be dumped into. It can also be a fifo(named pipe) from which nydusify or other tool can receive blob content.

Expand Down Expand Up @@ -77,4 +77,4 @@ nydus-image create \
/path/to/stargz.index.upper.json
```

Note: the argument value of image layer id specified in nydus-image CLI should omit `sha256:` prefix.
**Note**: the argument value of image layer id specified in nydus-image CLI should omit `sha256:` prefix.
7 changes: 4 additions & 3 deletions docs/nydus-overlayfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

`nydus-overlayfs` is a FUSE(Filesystem in UserSpacE) mount helper command for containerd to use with Nydus. The document explains in a nutshell on how it works.

When the `--enable-nydus-overlayfs` option is specified, `nydus-snapshotter` `Mount()` method returns a mount slice like
```shell
When the `--enable-nydus-overlayfs` option is specified, `nydus-snapshotter`'s `Mount()` method returns a mount slice like

```json
[
{
Type: "fuse.nydus-overlayfs",
Expand Down Expand Up @@ -35,6 +36,6 @@ And `nydus-overlayfs` parses the mount options, filters out `extraoption`, and c
mount -t overlay overlay ./foo/merged -o lowerdir=./foo/lower2:./foo/lower1,upperdir=./foo/upper,workdir=./foo/work,dev,suid
```

Meanwhile, when ncontainerd passes the `nydus-snapshotter` mount slice to `containerd-shim-kata-v2`, it can parse the mount slice and pass the `extraoption` to `nydusd`, to support nydus image format natively.
Meanwhile, when containerd passes the `nydus-snapshotter` mount slice to `containerd-shim-kata-v2`, it can parse the mount slice and pass the `extraoption` to `nydusd`, to support nydus image format natively.

So in summary, `containerd` and `containerd-shim-runc-v2` rely on the `nydus-overlay` mount helper to handle the mount slice returned by `nydus-snapshotter`, while `containerd-shim-kata-v2` can parse and handle it on its own.
6 changes: 3 additions & 3 deletions docs/prefetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ With option `prefetch-policy`, `nydus-image` tries to read stdin to gather a lis

Note that, `fs_prefetch` has to be enabled in rafs configuration file if prefetch is required.

### 1 File System Level
### 1. File System Level

Nydus issues prefetch requests to backend and pulls needed chunks to local storage. So read IO can hit the blobcache which was previously filled by prefetch. Speaking of file system level prefetch, the prefetch request is issued from Rafs layer. So it is easier to better understand about files layout on disk, the relationship between files and directories. Prefetch works on top of file system is born of agility and very nimble.

Expand All @@ -37,7 +37,7 @@ Prefetch is configurable by Rafs configuration file.
In unit of bytes.
In order to mitigate possible backend bandwidth contention, we can give a bandwidth ratelimit to prefetch. Note that the `bandwidth_rate` sets the limit to the aggregated backend bandwidth consumed by all the threads configured by `threads_count`. So with a lower `bandwidth_rate` limit, more prefetch threads might be meaningless.

A rafs configuration file (only $.fs_prefetch shows, other properties are omitted) follows:
A rafs configuration file (only `$.fs_prefetch` shows, other properties are omitted) follows:

```json
{
Expand Down Expand Up @@ -67,7 +67,7 @@ Nydus can now only prefetch data from backend by an explicit hint either from pr
- User IO triggered, block-level readahead.
- Prefetch the parent directory if one of its child is read.

### 2 Blob Level
### 2. Blob Level

Not like file system level prefetch, blob level prefetch directly pre-fetches a contiguous region from blob when nydusd started. This prefetch procedure is not aware of files, directories and chunks structures. When creating nydus image, a range descriptor composed of `readahead_offset` and `readahead_length` is written bootstrap. But blob level prefetch **won't** cache any data into blobcache or any other kind of cache. It works at `StorageBackend` level which is lower than `RafsCache`. For now, blob level prefetch only benefits `LocalFs` specific backend. In particular, `LocalFs` backend can perform Linux system call `readahead(2)` to load data from `readahead_offset` up to `readahead_length` bytes.

Expand Down