Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSM1: "panic:runtime error: index out of range" on query, but appears to be caught #4525

Closed
wrigtim opened this issue Oct 21, 2015 · 4 comments
Labels

Comments

@wrigtim
Copy link

wrigtim commented Oct 21, 2015

Just upgraded to 0.9.5-nightly-ff997a7, deleted all data and restarted. I noticed a stream of references to panics in the HTTP log, although the server appears to catch them and provides no trace.

Apologies for the vague issue, but I cannot easily reproduce right now. I'll see if this reappears as we will be restarting often due to WAL size limitations anyway...

[http] 2015/10/21 05:27:30 7.132.32.32 - root [21/Oct/2015:05:27:30 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+%22name%22+%3D+%27dm-4%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C
+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+%22name%22+%3D+%27md0%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+%22name%22+%3D+%27md1%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28n
one%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+%22name%22+%3D+%27md2%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+%22name%22+%3D+%27md3%2
7+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55a88f48-77de-11e5-800f-000000000000 1.507653ms [panic:runtime error: index out of range]
[http] 2015/10/21 05:27:30 7.132.32.32 - root [21/Oct/2015:05:27:30 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+%22name%22+%3D+%27dm-4%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C
+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+%22name%22+%3D+%27md0%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+%22name%22+%3D+%27md1%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28n
one%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+%22name%22+%3D+%27md2%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%2C+1s%29+FROM+%22io_writes%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+%22name%22+%3D+%27md3%2
7+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55b8ef17-77de-11e5-8010-000000000000 710.706µs [panic:runtime error: index out of range]
[http] 2015/10/21 05:27:31 7.132.32.32 - root [21/Oct/2015:05:27:31 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_in%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_in%22+WHERE+%22h
ost%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_in%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_out%22+WHERE+%22host%2
2+%3D+%27influxserverl5%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_out%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+non_negative_derivative%28mean%28value%29%29+FROM+%22net_drop_out%22+WHERE+%22host%22+%3
D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55d07e81-77de-11e5-8011-000000000000 1.792126ms [panic:runtime error: index out of range]
[http] 2015/10/21 05:27:31 7.132.32.32 - root [21/Oct/2015:05:27:31 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+mean%28value%29+FROM+%22mem_free%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29%0ASELECT+mean%28value%29+FROM+%22mem_used%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%2
85s%29%0ASELECT+mean%28value%29+FROM+%22mem_available%22+WHERE+%22host%22+%3D+%27influxserverl5%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55df9f46-77de-11e5-8012-000000000000 612.336µs [
panic:runtime error: index out of range]
[http] 2015/10/21 05:27:31 7.132.32.32 - root [21/Oct/2015:05:27:31 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+mean%28value%29+FROM+%22mem_free%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29%0ASELECT+mean%28value%29+FROM+%22mem_used%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%2
85s%29%0ASELECT+mean%28value%29+FROM+%22mem_available%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55e287ff-77de-11e5-8013-000000000000 817.656µs [
panic:runtime error: index out of range]
[http] 2015/10/21 05:27:31 7.132.32.32 - root [21/Oct/2015:05:27:31 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+mean%28value%29+FROM+%22mem_free%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29%0ASELECT+mean%28value%29+FROM+%22mem_used%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%2
85s%29%0ASELECT+mean%28value%29+FROM+%22mem_available%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/metrics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36 55f306df-77de-11e5-8014-000000000000 550.219µs [
panic:runtime error: index out of range]
[http] 2015/10/21 05:27:36 7.132.32.32 - root [21/Oct/2015:05:27:36 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+mean%28value%29+FROM+%22cpu_usage_user%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_nice%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now
%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_system%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_iowait%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASEL
ECT+mean%28value%29+FROM+%22cpu_usage_softirq%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_steal%22+WHERE+%22host%22+%3D+%27influxserverl6%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/met
rics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36 58e8ef72-77de-11e5-8015-000000000000 2.002789ms [panic:runtime error: index out of range]
[http] 2015/10/21 05:27:36 7.132.32.32 - root [21/Oct/2015:05:27:36 -0500] GET /query?db=telegraf&epoch=ms&q=SELECT+mean%28value%29+FROM+%22cpu_usage_user%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_nice%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now
%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_system%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_iowait%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASEL
ECT+mean%28value%29+FROM+%22cpu_usage_softirq%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29%0ASELECT+mean%28value%29+FROM+%22cpu_usage_steal%22+WHERE+%22host%22+%3D+%27influxserverl7%27+AND+time+%3E+now%28%29+-+1h+GROUP+BY+time%285s%29+fill%28none%29 HTTP/1.1 200 23 http://grafanaserverl4:3000/dashboard/db/met
rics-backend Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36 58f4ee39-77de-11e5-8016-000000000000 864.964µs [panic:runtime error: index out of range]
@otoolep
Copy link
Contributor

otoolep commented Oct 21, 2015

Did you set your engine to tsm1?

@wrigtim
Copy link
Author

wrigtim commented Oct 22, 2015

Yes - this was seen during an upgrade of nightly-cc54cba to nightly-ff997a7. The config file was unchanged:

[data]
  dir = "/data/influxdb-data/metrics"
  engine = "tsm1"

  # WAL config options for b1 (introduced in 0.9.2)
  #max-wal-size = 10485760000
  #wal-flush-interval = "30s"
  #wal-partition-flush-delay = "2s"

  # WAL configuration options for bz1 (introduced in 0.9.3)
  wal-dir = "/data/influxdb-wal" # Issue 4498 to make this work for tsm1
  #wal-logging-enabled = true
  #wal-ready-series-size = 512000
  #wal-compaction-threshold = 0.5
  #wal-max-series-size = 1048576
  #wal-flush-cold-interval = "5s"
  #wal-partition-size-threshold = 1000000000

  # WAL configuration options for tsm1 introduced in 0.9.5
  wal-flush-memory-size-threshold = 26214400 # Default 5MB; ours 25MB
  wal-max-memory-size-threshold = 2147483648 # Default 100MB; ours 2GB

  # compaction options for tsm1 introduced in 0.9.5

  # IndexCompactionAge specifies the duration after the data file creation time
  # at which it is eligible to be compacted
  index-compaction-age = 120

  # IndexMinimumCompactionInterval specifies the minimum amount of time that must
  # pass after a compaction before another compaction is run
  index-min-compaction-interval = 10

  # IndexCompactionFileCount specifies the minimum number of data files that
  # must be eligible for compaction before actually running one
  index-compaction-min-file-count = 5

  # IndexCompactionFullAge specifies how long after the last write was received
  # in the WAL that a full compaction should be performed.
  index-compaction-full-age = 7200

  query-log-enabled = false

@jwilder
Copy link
Contributor

jwilder commented Oct 27, 2015

This might be fixed by #4587. Panic error message looks similar to #4513, but might be occurring at query time and not at WAL flush time.

@wrigtim
Copy link
Author

wrigtim commented Oct 29, 2015

@jwilder We haven't actually seen this since moving to 0.9.5-nightly-2fe5e6b, so I believe we can close this out. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants