Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0.0-beta1] Full compaction never stops #6885

Closed
kjeacle opened this issue Jun 21, 2016 · 8 comments
Closed

[1.0.0-beta1] Full compaction never stops #6885

kjeacle opened this issue Jun 21, 2016 · 8 comments
Assignees
Labels
Milestone

Comments

@kjeacle
Copy link

kjeacle commented Jun 21, 2016

We are running 1.0.0-beta1 with just over 1 million series populated from collectd. Server has 16 cores and 32GB of RAM. Writing less than 5000 points per second.

When full compaction starts after 24 hours, it never stops. Logs show full compaction is ongoing. Files appear to be compacted without change.

Here's an extract from the logs. It always spends about 80 seconds on a compaction run.

[tsm1] 2016/06/21 10:09:06 compacted full group (0) into /var/lib/influxdb/data/snmp/default/127/000001067-000054314.tsm.tmp (#0)
[tsm1] 2016/06/21 10:09:06 compacted full group (0) into /var/lib/influxdb/data/snmp/default/127/000001067-000054315.tsm.tmp (#1)
[tsm1] 2016/06/21 10:09:06 compacted full 2 files into 2 files in 1m19.731138921s

[tsm1] 2016/06/21 10:09:07 compacting full group (0) /var/lib/influxdb/data/snmp/default/127/000001067-000054314.tsm (#0)
[tsm1] 2016/06/21 10:09:07 compacting full group (0) /var/lib/influxdb/data/snmp/default/127/000001067-000054315.tsm (#1)

[tsm1] 2016/06/21 10:10:26 compacted full group (0) into /var/lib/influxdb/data/snmp/default/127/000001067-000054316.tsm.tmp (#0)
[tsm1] 2016/06/21 10:10:26 compacted full group (0) into /var/lib/influxdb/data/snmp/default/127/000001067-000054317.tsm.tmp (#1)
[tsm1] 2016/06/21 10:10:26 compacted full 2 files into 2 files in 1m18.539984493s

[tsm1] 2016/06/21 10:10:27 compacting full group (0) /var/lib/influxdb/data/snmp/default/127/000001067-000054316.tsm (#0)
[tsm1] 2016/06/21 10:10:27 compacting full group (0) /var/lib/influxdb/data/snmp/default/127/000001067-000054317.tsm (#1)

Here are the files being updated on disk:

-rw-r--r--  1 influxdb influxdb 2148106613 Jun 21 10:06 000001067-000054312.tsm
-rw-r--r--  1 influxdb influxdb 1169976570 Jun 21 10:07 000001067-000054313.tsm
-rw-r--r--  1 influxdb influxdb 2148106613 Jun 21 10:08 000001067-000054314.tsm.tmp
-rw-r--r--  1 influxdb influxdb 1169976570 Jun 21 10:08 000001067-000054315.tsm.tmp

-rw-r--r--  1 influxdb influxdb 2148106613 Jun 21 10:08 000001067-000054314.tsm
-rw-r--r--  1 influxdb influxdb 1169976570 Jun 21 10:08 000001067-000054315.tsm
-rw-r--r--  1 influxdb influxdb 2148106613 Jun 21 10:09 000001067-000054316.tsm.tmp
-rw-r--r--  1 influxdb influxdb  394264576 Jun 21 10:10 000001067-000054317.tsm.tmp

-rw-r--r--  1 influxdb influxdb 2148106613 Jun 21 10:09 000001067-000054316.tsm
-rw-r--r--  1 influxdb influxdb 1169976570 Jun 21 10:10 000001067-000054317.tsm
-rw-r--r--  1 influxdb influxdb 1207959552 Jun 21 10:10 000001067-000054318.tsm.tmp

000001067-000054314 and 000001067-000054315 have been compacted into 000001067-000054316 and 000001067-000054317 but the newly compacted files are identical to the old ones:

$ sum 000001067-000054314.tsm 000001067-000054315.tsm
32095 2097761 000001067-000054314.tsm
60311 1142556 000001067-000054315.tsm

$ sum 000001067-000054316.tsm 000001067-000054317.tsm
32095 2097761 000001067-000054316.tsm
60311 1142556 000001067-000054317.tsm

Closer look at files:

$ influx_inspect dumptsmdev 000001067-000054314.tsm
Summary:
  File: 000001067-000054314.tsm
  Time Range: 2016-05-27T17:00:52.455141Z - 2016-05-29T23:59:59.616308Z
  Duration: 54h59m7.161167s   Series: 620409   File Size: 2148106613

Statistics
  Blocks:
    Total: 620409 Size: 2065239459 Min: 177 Max: 8089 Avg: 3328
  Index:
    Total: 620409 Size: 82867141
  Points:
    Total: 364429544
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 620409 (100%)
    Float:      none: 0 (0%)    gor: 620409 (100%)
  Compression:
    Per block: 5.67 bytes/point
    Total: 5.89 bytes/point

$ influx_inspect dumptsmdev 000001067-000054315.tsm
Summary:
  File: 000001067-000054315.tsm
  Time Range: 2016-05-27T17:00:52.46263Z - 2016-05-29T23:59:59.616308Z
  Duration: 54h59m7.153678s   Series: 370638   File Size: 1169976570

Statistics
  Blocks:
    Total: 370638 Size: 1118210098 Min: 177 Max: 7924 Avg: 3016
  Index:
    Total: 370638 Size: 51766459
  Points:
    Total: 204533419
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 370638 (100%)
    Float:      none: 0 (0%)    gor: 370638 (100%)
  Compression:
    Per block: 5.47 bytes/point
    Total: 5.72 bytes/point
@jwilder jwilder added this to the 1.0.0 milestone Jun 22, 2016
@pkittenis
Copy link

Seeing same behaviour in 0.13.0 and previous versions, not sure when it started exactly.

influxdb_writes_2

Log output is similar to yours, constant full compactions taking similar time of almost 2min each, per below attachment.

influxdb_write_logs.txt

@pkittenis
Copy link

Confirmed to be triggered by compactions run after compact-full-write-cold-duration interval has passed.

With it set to 48h, issue occurs 48hours after DB restart, per below.

influxdb_writes1

@jwilder
Copy link
Contributor

jwilder commented Jul 14, 2016

The changes in #6952 may have fixed this. Would you be able to test that build and see if the issue is still occurring?

@jwilder jwilder self-assigned this Jul 14, 2016
@kjeacle
Copy link
Author

kjeacle commented Jul 17, 2016

As our series cardinality was over 1 million, we split our database in two and have two instances of influxd running now. After this change, we are not hitting the bug any more. I still have a copy of the files above that were being continuously compacted in case they are of any use.

@jwilder
Copy link
Contributor

jwilder commented Jul 18, 2016

This should be fixed via #6952

@jwilder jwilder closed this as completed Jul 18, 2016
@kjeacle
Copy link
Author

kjeacle commented Sep 7, 2016

Unfortunately, that fix doesn't appear to have worked.

We are again seeing non-stop full compaction as above on 1.0.0-rc2.

@jwilder
Copy link
Contributor

jwilder commented Sep 7, 2016

The fix is not in rc2. It will be in the 1.0 final and nightly master tomorrow. How are you testing it?

@kjeacle
Copy link
Author

kjeacle commented Sep 7, 2016

Not testing as such - just noticed that we are seeing the problem again at present while running 1.0.0-rc2 with production data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants