Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

TSDB data import tool #671

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

TSDB data import tool #671

wants to merge 6 commits into from

Conversation

dipack95
Copy link

@dipack95 dipack95 commented Aug 2, 2019

Created a tool to import data formatted according to the Prometheus exposition format. The tool can be accessed via the TSDB CLI.

Addresses prometheus/prometheus#535

Signed-off-by: Dipack P Panjabi dpanjabi@hudson-trading.com

Dipack P Panjabi added 3 commits August 2, 2019 11:38
exposition format. The tool can be accessed via the TSDB CLI.

Addresses #535.

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
@dipack95
Copy link
Author

dipack95 commented Aug 2, 2019

The Linux build seems to have failed because it could not download a package.

@dipack95
Copy link
Author

dipack95 commented Aug 5, 2019

@krasi-georgiev It's not strictly related, but I was going through the tsdb dump code, and noticed that the format that it outputs data in a style that is different enough from the Prometheus exposition format. If we wanted to allow users to import data, I think we should use a single format for all such operations, i.e. maybe changing the dumping code to output data according expfmt.

What is your opinion on this?

@krasi-georgiev
Copy link
Contributor

@juliusv added the tsdb dump code so maybe he can share his use case for that.

@krasi-georgiev
Copy link
Contributor

@juliusv added the tsdb dump code so maybe he can share his use case for that.

Can you show an example file of how would the data look like for this import tool. and maybe some example code on how to write such a file that can be used to import the data.

This should give a better idea on which format to use.

My first impression is that it would be easyer for people to write some tool to write a file with json data than the Prometheus format, but some examples would help take this decision.

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

@krasi-georgiev The exporter I used to export data in expfmt is here https://gist.github.com/dipack95/171b7e3ac226f296f49f0e320eb486bf

You can run it using python3 exporter.py --entries 10.

The data that it exports looks like this:

# HELP dummy_events_0 This is dummy events counter number 0
# TYPE dummy_events_0 gauge
dummy_events_0{foo="I am here",uniqlo="uniqlo"} 9984.0 1564681573925
# HELP dummy_events_1 This is dummy events counter number 1
# TYPE dummy_events_1 gauge
dummy_events_1{foo="I am here",uniqlo="uniqlo"} 9002.0 1564681573925
# HELP dummy_events_2 This is dummy events counter number 2
# TYPE dummy_events_2 gauge
dummy_events_2{foo="I am here",uniqlo="uniqlo"} 1624.0 1564681573925
# HELP dummy_events_3 This is dummy events counter number 3
# TYPE dummy_events_3 gauge
dummy_events_3{foo="I am here",uniqlo="uniqlo"} 4227.0 1564681573925
# HELP dummy_events_4 This is dummy events counter number 4
# TYPE dummy_events_4 gauge
dummy_events_4{foo="I am here",uniqlo="uniqlo"} 4386.0 1564681573925
# HELP dummy_events_5 This is dummy events counter number 5
# TYPE dummy_events_5 gauge
dummy_events_5{foo="I am here",uniqlo="uniqlo"} 7354.0 1564681573925
# HELP dummy_events_6 This is dummy events counter number 6
# TYPE dummy_events_6 gauge
dummy_events_6{foo="I am here",uniqlo="uniqlo"} 7805.0 1564681573925
# HELP dummy_events_7 This is dummy events counter number 7
# TYPE dummy_events_7 gauge
dummy_events_7{foo="I am here",uniqlo="uniqlo"} 2952.0 1564681573925
# HELP dummy_events_8 This is dummy events counter number 8
# TYPE dummy_events_8 gauge
dummy_events_8{foo="I am here",uniqlo="uniqlo"} 6429.0 1564681573925
# HELP dummy_events_9 This is dummy events counter number 9
# TYPE dummy_events_9 gauge
dummy_events_9{foo="I am here",uniqlo="uniqlo"} 5266.0 1564681573925
dummy_events_0{foo="I am here",uniqlo="uniqlo"} 2468.0 1564681588927
dummy_events_1{foo="I am here",uniqlo="uniqlo"} 9606.0 1564681588927
dummy_events_2{foo="I am here",uniqlo="uniqlo"} 6031.0 1564681588927
dummy_events_3{foo="I am here",uniqlo="uniqlo"} 3613.0 1564681588927

It's quite easy to export data in this format, provided you use the prometheus client libs, and that is why I prefer it over JSON, for which we will have to write an intermediate to once again output expfmt data, to be accepted by the text parsers.

@brian-brazil
Copy link
Contributor

What would that look like for samples across multiple blocks?

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

@brian-brazil I don't quite understand what you mean by samples across different blocks? Do you mean different metrics exposed in the same text file?

@brian-brazil
Copy link
Contributor

No, I mean how do you handle data that overlaps multiple blocks.

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

To prevent any issues when importing into an existing TSDB instance, I have a step before the actual import that checks for any overlaps, and if there are any, it aborts the import process.

If you wanted to go ahead and import data that overlaps with what is present in the target TSDB instance (because you have AllowOverlappingBlocks enabled), you could skip the check during the import using the --skip-import-check flag.

@brian-brazil
Copy link
Contributor

you could skip the check during the import using the --skip-import-check flag.

This will then result in massive blocks, which isn't usually desirable. You want to have new blocks that match up with existing blocks..

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

You're right about that; importing a lot of data will create large block, but I assumed that the blocks sizes will line up over time during compaction as well, so this wouldn't be much of an issue?

Alternatively, we could call dbapp.Commit() after appending X (1000?) samples, to ensure that new blocks are created, and data is distributed as evenly as possible.

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

Actually, I misspoke, calling db.Snapshot(..) makes more sense, as it actually does create separate blocks.

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

Going down that route too, still gives me blocks of similar sizes. I'm not quite sure if there is a clean way to properly separate the samples into (almost) even blocks. Do you have any suggestions?

@brian-brazil
Copy link
Contributor

You can either use the existing blocks, or just go with 2h

@dipack95
Copy link
Author

dipack95 commented Aug 6, 2019

I opted against using the existing blocks as using this method we can import data directly into a live instance, and it will be picked up as usual.

As for the 2h block range option, I'm currently creating a temp TSDB instance, using the default exponential block range, and then creating a snapshot from it. Shouldn't the Snapshot(..) refer to the options set and create an equivalent number of blocks?

@brian-brazil
Copy link
Contributor

That depends on the flags passed to the running Prometheus.

@codesome
Copy link
Contributor

codesome commented Aug 7, 2019

Using the time ranges of those blocks to create new blocks would be ideal as @brian-brazil said. You can get those time ranges by opening the DB in read only mode.

And if possible, it would be better to avoid opening a DB instance and creating blocks via it. Instead I suggest to re-use the existing functions from compact.go and write the blocks directly, which would be more efficient.

@codesome codesome closed this Aug 7, 2019
@codesome codesome reopened this Aug 7, 2019
@codesome
Copy link
Contributor

codesome commented Aug 7, 2019

Closed by mistake. Reopened.

@codesome
Copy link
Contributor

codesome commented Aug 7, 2019

Also I think @krasi-georgiev is still gathering opinions on the data format for the import data.

@dipack95
Copy link
Author

dipack95 commented Aug 7, 2019

@codesome One of the use cases for importing data, for us, is to back-populate data for a new metric that we've just started recording. In that instance, I think it makes sense to create new blocks entirely.

To ensure that we don't run into issues regarding overlapping data, I open the target TSDB instance in RO mode, and validate that the most recent datapoint in the incoming data is before the start of the current data. The user can choose to skip this step, however.

Regarding the large block size issue, I am working off of @brian-brazil's recommendation, and splitting the incoming data into chunks of 2h each, to keep block sizes down.

compactor, instead of creating a new TSDB instance.

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
@dipack95
Copy link
Author

dipack95 commented Aug 7, 2019

Blocks are now created with a max duration of 2h each, and are written directly to disk using the compaction functions, instead of creating a new TSDB instance.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure pulling everything into RAM is a good idea.

importer/import.go Outdated Show resolved Hide resolved
importer/import.go Show resolved Hide resolved
Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
…emory

Signed-off-by: Dipack P Panjabi <dpanjabi@hudson-trading.com>
@dipack95
Copy link
Author

dipack95 commented Aug 7, 2019

The blocks are now written as soon as they're cut, to prevent potentially going OOM by parsing too many samples at once.

The Windows build seems to have timed out before it began.

@codesome
Copy link
Contributor

codesome commented Aug 8, 2019

Smaller blocks doesn't rule out the possibility of a big block at the end. It is very much possible.

All your blocks: |------------|--------|----|----|----|

Newly written:            |----|----|----|----|----|
small blocks

Result after:    |------------------------------------|
compaction

And this will result in a huge index and the limit of index right now is 64 GiB (soon to be lifted, but such big index will degrade performance)

@brian-brazil
Copy link
Contributor

I'm presuming we're aligning them in the usual way.

@dipack95
Copy link
Author

dipack95 commented Aug 8, 2019

@codesome Wouldn't the scenario that you've pointed out happen anyway over the course of normal operation? Given enough time, obviously.

@codesome
Copy link
Contributor

codesome commented Aug 8, 2019

@dipack95 The block size is limited to 1 month, or retention_duration/10, whichever is lower. So no, entire database won't turn into a single block. The above has potential to cross the limit.

@dipack95
Copy link
Author

dipack95 commented Aug 8, 2019

Based on my understanding, we could inflate the index files massively if we set the retention duration long enough. I don't think this importer really creates this problem, as it already exists, depending on how you configure prometheus.

As the comments in prometheus/prometheus#535 suggest, there are a lot of use cases where bulk importing of data is useful. In practice, the primary purpose of Prometheus is to represent recent state of a system, and it should be quite difficult for users to hit the index limits you've pointed out. For example, at ~60 million series per block with 5 labels each, the index size is around 5-6 GiB; there wasn't much lag when querying this data.

@codesome
Copy link
Contributor

we could inflate the index files massively if we set the retention duration long enough

Yes. Also, the block size is capped at 31 days, so you cannot inflate beyond that.

at ~60 million series per block with 5 labels each, the index size is around 5-6 GiB

I think the block here doesn't span a larger time range. Chunk references are also a part of index. I have personally seen index hitting 20G for 8-10M series with retention duration of 90 days (means a block is capped at 9 days).

The idea here is to align the time ranges of the newly created blocks with the existing blocks to avoid cases which I described in #671 (comment). This is because the overlapping blocks are not kept as it is in the database, they are all compacted to form a single huge block. It won't be taken care directly by using small block durations.

@dipack95
Copy link
Author

dipack95 commented Aug 12, 2019

If I understand correctly, you're looking for a block structure as follows:

1. Overlapping time range

Existing data: |---------|-----------|----------|
New data:      |----     |-----      |--------  |

2. Non-overlapping time range

Existing data: 				|---------|-----------|----------|
New data: 		|----|----| (in 2h blocks)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants