Hybrid storage format #77

jiacai2050 · 2022-06-30T16:07:14Z

Description

For now, data by default is ordered by timestamp column within one SST file(currently in Parquet format), each tag/field being a column.

Timestamp	Device ID	Status Code	Tag 1	Tag 2
12:01	A	0	v1	v1
12:01	B	0	v2	v2
12:02	A	0	v1	v1
12:02	B	1	v2	v2
12:03	A	0	v1	v1
12:03	B	0	v2	v2
…

This design is good for OLAP queries, as it will only scan relevant columns, and CeresDB can take advantage of this ordering to filter unnecessary file, reducing IO further.

But for time-series user case like IoT or DevOps, this maybe not the best format. Those queries will typically first group its result by series id(or device-id), then by timestamp. This ordering isn't match with SST, so many random IOs will be incurred.

A general approach is to duplicate data twice: one ordered by timestamp first, and the other ordered by series id first.

Apparently this isn't very cost-effective, and will require some replication algorithm to synchronize data, which is very error-prone. It's best we could solve this ordering issue in one format.

Proposal

This issue propose one potential hybrid format (OLAP and time-series):

Device ID	Timestamp	Status Code	Tag 1	Tag 2	minTime	maxTime
A	[12:01,12:02,12:03]	[0,0,0]	v1	v1	12:01	12:03
B	[12:01,12:02,12:03]	[0,1,0]	v2	v2	12:01	12:03
…

In the above schema, instead of store timestamp row by row, we put timestamp within a device id in one array, and the corresponding values are also in array type, so we can easily map between them. The table is ordered by device ID.

In this way, we can avoid random IO when query one specific device, since its data are stored together, and this format is also beneficial for OLAP queries since we can use min/maxTime to help reader filter unnecessary chunks.

Additional context

Some references

Building columnar compression in a row-oriented database

The text was updated successfully, but these errors were encountered:

jiacai2050 · 2022-07-18T08:38:40Z

I have done a benchmark in my local env, This hybrid format is better than the old one.

Table below summarize read cost in each format(each is read ten times).

Hybrid

cost	row nums
615ms	10367743
576ms	10367743
585ms	10367743
511ms	10367743
558ms	10367743
569ms	10367743
568ms	10367743
555ms	10367743
557ms	10367743
584ms	10367743

Old

cost	row nums
1304ms	10367743
1283ms	10367743
1276ms	10367743
1286ms	10367743
1275ms	10367743
1272ms	10367743
1273ms	10367743
1275ms	10367743
1275ms	10367743
1270ms	10367743

How it tests

Firstly, my test env is

Linux 5.17.7-arch1-1 SMP PREEMPT Thu, 12 May 2022 18:55:54 +0000 x86_64 GNU/Linux
6c16g
commit: https://github.com/jiacai2050/ceresdb/tree/d9577d5d417a811d37ff54239b81b44eff1f499c
- bench-hybrid.rs

Data is generated using tsbs, with config below

data-source:
  simulator:
    debug: 0
    initial-scale: "0"
    log-interval: 10s
    max-data-points: "0"
    max-metric-count: "1"
    scale: "50000"
    seed: 100
    timestamp-start: "2022-07-02T00:00:00Z"
    timestamp-end: "2022-07-02T01:00:00Z"
    use-case: devops-generic
  type: SIMULATOR

This means the generated data source is

one metric within one hour, point interval is 10s, 50k series total.

Data sample

{
      "arch": "x86",
      "region": "ap-southeast-1",
      "service_environment": "test",
      "team": "SF",
      "value": 473.0,
      "service_version": "0",
      "datacenter": "ap-southeast-1b",
      "timestamp": 1656720000000,
      "os": "Ubuntu16.04LTS",
      "hostname": "host_3349",
      "rack": "80",
      "service": "6",
      "tsid": 1123006250071095
    }

Next step

Rebase with upstream master, apply this hybrid format with string column(currently only fixed-length column tested).

jiacai2050 · 2022-08-17T02:12:15Z

Checklist

Write feat: write hybrid storage format #185
Read feat: read hybrid format #208
Add table option for storage format feat: add new table_option storage format #218
Docs docs: improve options section #222

jiacai2050 · 2022-08-24T07:43:52Z

There are some more things need to be done for good performance, leave here to keep a note for myself and hope others interested can get involved.

Write

Support variable-length type for ListArray
Support table without tsid, only a row id is required

Read

Support basic read(without any filter pushdown), WIP
Support timestamp column filter, some extra columns may be needed
Support variable-length type for ListArray
Enable a total ordering, to support query with pagination

Misc

Ensure row group size is large enough, in case of list length within same row_id is to small
Use dictionary array type to represent non-collapsible columns to reduce memory usage.
Benchmark between two format

chunshao90 · 2022-08-24T09:29:08Z

Checklist

Write feat: write hybrid storage format #185

Read

Add table option for storage format

More testcases for write/read

This checklist is outdated.

jiacai2050 added the feature label Jun 30, 2022

waynexia assigned jiacai2050 Jul 4, 2022

waynexia added the A-analytic-engine label Jul 20, 2022

waynexia mentioned this issue Jul 27, 2022

Release v0.3 #153

Closed

jiacai2050 mentioned this issue Aug 10, 2022

feat: write hybrid storage format #185

Merged

This was referenced Aug 18, 2022

feat: read hybrid format #207

Merged

feat: read hybrid format #208

Merged

jiacai2050 changed the title ~~[PoC] Hybrid storage format~~ Hybrid storage format Aug 24, 2022

This was referenced Aug 25, 2022

feat: add new table_option storage format #218

Merged

docs: improve options section #222

Merged

Enhance to hybrid storage format #227

Closed

jiacai2050 closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid storage format #77

Hybrid storage format #77

jiacai2050 commented Jun 30, 2022 •

edited

Loading

jiacai2050 commented Jul 18, 2022 •

edited

Loading

jiacai2050 commented Aug 17, 2022 •

edited

Loading

jiacai2050 commented Aug 24, 2022 •

edited

Loading

chunshao90 commented Aug 24, 2022

Checklist

Hybrid storage format #77

Hybrid storage format #77

Comments

jiacai2050 commented Jun 30, 2022 • edited Loading

jiacai2050 commented Jul 18, 2022 • edited Loading

Hybrid

Old

How it tests

Next step

jiacai2050 commented Aug 17, 2022 • edited Loading

Checklist

jiacai2050 commented Aug 24, 2022 • edited Loading

Write

Read

Misc

chunshao90 commented Aug 24, 2022

Checklist

jiacai2050 commented Jun 30, 2022 •

edited

Loading

jiacai2050 commented Jul 18, 2022 •

edited

Loading

jiacai2050 commented Aug 17, 2022 •

edited

Loading

jiacai2050 commented Aug 24, 2022 •

edited

Loading