Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD: supports multiple level meta data space #87

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions text/0083-multi-level-meta-data-space.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Multi-level Meta Data Space

## Summary

PD can handle multiple users' meta data, each user only see its own meta data.

## Motivation

PD is the mata data center of TiKV Cluster, the meta data contains TiKV instances' addr, user data range(region)
information, the number of replicas of data, etc.

But PD currently only can handle the key space [min-key, max-key], it means that PD only can handle one user's
meta data, it is not possible when multiple users want share the same PD cluster as their meta system. There are
two different scenarios that require PD can handle multiple user' meta data:

1. Multiple TiKV Cluster share the same PD cluster. Because the minimal demplyment of a TiKV Cluster is 3 TiKV 3 PD,
but it is not cost-effect if every small cluster has 3 dedicated meta data node.
2. There are Multiple tenant in the same TiKV Cluster, each tenant has it own meta data, each tenant's key range can
Copy link
Contributor

@nolouch nolouch Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the keyspace in API v2 match this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, v2 API can not satisfy multiple TiDB tenants.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple TiDB tenants, each TiDB should has its own ddl-owner, gc-safepoint and other meta data, these meta data should be stored in PD separately. This RFC is more about how PD store multiple user's meta data.

contains any key in the range of [min-key, max-key].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it practical, every APIs need to be accept a user prefix. And each users' data can't be stored in the same rocksdb obviously. This also requires PD have knowledge about the underlying storage engine and avoid scheduling replicas from different users to the same storage engine. And TiKV needs to split all memory meta to different users. For example, the index of range becomes HashMap<UserKey, BTreeMap<Vec<u8>, u64>>.

In my opinion, using prefix is more straightforward and simpler.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And each users' data can't be stored in the same rocksdb obviously

This is what I expected. After TiKV implemented the Multiple-RocksDB feature, data from different tenant should stored in the different RocksDB instance. Tenant is the meta data, include the meta data in very row of data is redundant, we can store the tenant id to the RocksDB instance's directory name, like u0001_rangeid. Even more, the table id essentially also is meta data, it can can be stored in the directory name like u0001_rangeid_tableid, so the data key in RocksDB row_id. In this way, we can satisfy the compatibility requirement with old cluster's data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using prefix can also achieve the same improvement. The difference of using prefix and using a separate explicit meta is that PD/TiKV/TiDB needs to take good care about meta in the later case.


## Detailed design

Change the meta data from
```
{meta data}
```
to
```
user1 {meta data}
user2 {meta data}
user3 {meta data}
...
```

### Compatibility

When upgrade from old version, all legacy meta data belongs to the default meta data space.
```
{meta data}
```
```
default {meta data}
user1 {meta data}
user2 {meta data}
...
```

## Alternatives

In the multi-tenant scenario, tenant can add a {tenant-id} prefix for each data key, but tenant-id
nolouch marked this conversation as resolved.
Show resolved Hide resolved
is a meta data esstionally, each data key has a tenant-id prefix may cost more disk space & memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any perf stats to show the cost?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert QPS of having prefix has 4% regression compare with no prefix.

Copy link
Member Author

@zhangjinpeng87 zhangjinpeng87 Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger key size will consume more raftlog or wal and more CPU when comparing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prefix is used for testing? Note a two byte prefix can support 32768 tenants already.

in TiKV.