Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB crashes with OOM #6991

Closed
deepujain opened this issue Jul 10, 2016 · 1 comment
Closed

InfluxDB crashes with OOM #6991

deepujain opened this issue Jul 10, 2016 · 1 comment

Comments

@deepujain
Copy link

deepujain commented Jul 10, 2016

Version : influxdb-0.13.0-1 & influxdb-0.11.0-1

Machine Details (Single Node System)

cat $INFLUXDB_HOME
cat: /root/healthmonitor/influxdb-0.13.0-1: Is a directory
root@influxdb-hybrid-860058:# free
total used free shared buffers cached
Mem: 12305660 1735684 10569976 388 8456 157632
-/+ buffers/cache: 1569596 10736064
Swap: 0 0 0
root@influxdb-hybrid-860058:
# uname -a
Linux influxdb-hybrid-860058 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@influxdb-hybrid-860058:~#

Query Set.

$INFLUXDB_HOME/usr/bin/influx -host localhost -port 8086 -precision rfc3339 -database ep -execute 'SELECT count(distinct(eId)) as units INTO totalAlerts FROM epdetail WHERE time > now() - 220d GROUP BY time(1d) fill(none)'

$INFLUXDB_HOME/usr/bin/influx -host localhost -port 8086 -precision rfc3339 -database ep -execute 'SELECT count(distinct(eId)) as units INTO bualerts FROM epdetail WHERE time > now() - 220d GROUP BY time(1d), businessUnit fill(none)'

$INFLUXDB_HOME/usr/bin/influx -host localhost -port 8086 -precision rfc3339 -database ep -execute 'SELECT count(distinct(eId)) as units INTO siteIdAlerts FROM epdetail WHERE time > now() - 220d GROUP BY time(1d), siteId fill(none)'

$INFLUXDB_HOME/usr/bin/influx -host localhost -port 8086 -precision rfc3339 -database ep -execute 'SELECT count(distinct(eId)) as units INTO channelIdAlerts FROM epdetail WHERE time > now() - 220d GROUP BY time(1d), channelId fill(none)'

After i ingest the raw data and run above set of commands to create a high level view of the system broken down at various levels the system crashes with OOM on version 0.13 and barely runs with version 0.11
We have our entire monitoring system built on Grafana that uses InfluxDB at backend.

Questions
0. OOM occurs immediately with version 13.0-1 and not with 0.11.0-1. After used memory went up to 11GB (out of 12GB) for 0.11.0-1, and that means it can crash anytime.

  1. How do i fix this OOM issue ?
  2. What changed in version 13.0-1 that is causing OOM.
  3. Is all the data stored in memory ? If so then am limited with the size of memory. Currently i have only 12GB ram (cloud, difficult to get big boxes) and its only a single node system.

Please advise.

[httpd] 2016/07/09 11:58:54 10.103.178.199 - root [09/Jul/2016:11:58:54 -0700] GET /query?db=ep&epoch=ms&q=select++%28failureCount%2FtotalCount%29%2A100+from+epsummary+where+alertId%3D%273%27+AND+time+%3E+now%28%29+-+100d HTTP/1.1 200 782 http://healthmonitor-860059.lvs01.eaz.ebayc3.com:8080/dashboard/db/experimentation-anomalies Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36 2ee3c79f-4607-11e6-81dc-000000000000 69.318753ms
[httpd] 2016/07/09 11:58:54 10.103.178.199 - root [09/Jul/2016:11:58:54 -0700] GET /query?db=ep&epoch=ms&q=select++%28failureCount%2FtotalCount%29%2A100+from+epsummary+where+alertId%3D%274%27+AND+time+%3E+now%28%29+-+100d HTTP/1.1 200 659 http://healthmonitor-860059.lvs01.eaz.ebayc3.com:8080/dashboard/db/experimentation-anomalies Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36 2ee82d39-4607-11e6-81de-000000000000 62.329095ms
[httpd] 2016/07/09 12:03:01 127.0.0.1 - - [09/Jul/2016:12:03:01 -0700] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.13.0 c23f0b0b-4607-11e6-81e2-000000000000 106.072µs
[query] 2016/07/09 12:03:01 SELECT count(distinct(eId)) AS units INTO ep."default".totalAlerts FROM ep."default".epdetail WHERE time > now() - 220d GROUP BY time(1d) fill(none)
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xce4740, 0x16)
/usr/local/go/src/runtime/panic.go:547 +0x90
runtime.sysMap(0xcad9c70000, 0x100000, 0x7fcfd1fb8c00, 0x10efab8)
/usr/local/go/src/runtime/mem_linux.go:206 +0x9b
runtime.(_mheap).sysAlloc(0x10d4de0, 0x100000, 0x0)
/usr/local/go/src/runtime/malloc.go:429 +0x191
runtime.(_mheap).grow(0x10d4de0, 0x8, 0x0)
/usr/local/go/src/runtime/mheap.go:651 +0x63
runtime.(_mheap).allocSpanLocked(0x10d4de0, 0x3, 0x7fcfaa8d14a0)
/usr/local/go/src/runtime/mheap.go:553 +0x4f6
runtime.(_mheap).alloc_m(0x10d4de0, 0x3, 0x2000000002c, 0x7fcfaa8d14a0)
/usr/local/go/src/runtime/mheap.go:437 +0x119
runtime.(_mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:502 +0x41
runtime.systemstack(0x7fcfd1fb8d60)
/usr/local/go/src/runtime/asm_amd64.s:307 +0xab
runtime.(_mheap).alloc(0x10d4de0, 0x3, 0x1000000002c, 0x412674)
/usr/local/go/src/runtime/mheap.go:503 +0x63
runtime.(_mcentral).grow(0x10d73f0, 0x0)
/usr/local/go/src/runtime/mcentral.go:209 +0x93
runtime.(_mcentral).cacheSpan(0x10d73f0, 0xc85dbae1f0)
/usr/local/go/src/runtime/mcentral.go:89 +0x47d
runtime.(*mcache).refill(0x7fcfd50d9000, 0x2c, 0xc83664d6c0)
/usr/local/go/src/runtime/mcache.go:119 +0xcc
runtime.mallocgc.func2()
/usr/local/go/src/runtime/malloc.go:642 +0x2b
runtime.systemstack(0xc82001e000)
/usr/local/go/src/runtime/asm_amd64.s:291 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1051

And there is a very long stack trace after this. If required, i can share it.

@deepujain
Copy link
Author

deepujain commented Jul 10, 2016

Now the free memory has gone up and influxDB is running. Version is 0.11.0-1

free
total used free shared buffers cached
Mem: 12305660 3542112 8763548 400 39028 121044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant