-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elasticsearch-executors do not respect memory limit (in jar mode) #572
Comments
Do you have |
Yes, mesos is running with |
it looks like the elasticsearch java processes are launched outside of the cgroup that has the memory limit applied: root 7407 0.0 0.0 115244 1484 ? Ss 12:48 0:00 sh -c mkdir -p /var/lib/mesos/slave/elasticsearch ./.; chown -R nobody /var/lib/mesos/slave/elasticsearch ./.; su -s /bin/sh -c "./elasticsearch-*/bin/elasticsearch --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false" nobody
root 7410 0.0 0.0 185732 2396 ? S 12:48 0:00 su -s /bin/sh -c ./elasticsearch-*/bin/elasticsearch --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false nobody
nobody 7411 3.6 2.8 4065000 422124 ? Ssl 12:48 2:14 /bin/java -Xms384m -Xmx384m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/var/lib/mesos/slaves/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3/frameworks/60061f47-c9ea-43bf-bd4e-592d227ce521-0001/executors/elasticsearch_resching-aws-worker-003.node.consul_20160627T124841.934Z/runs/ad4e267f-0566-429b-9a4f-046264076afa/elasticsearch-2.2.0 -cp /var/lib/mesos/slaves/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3/frameworks/60061f47-c9ea-43bf-bd4e-592d227ce521-0001/executors/elasticsearch_resching-aws-worker-003.node.consul_20160627T124841.934Z/runs/ad4e267f-0566-429b-9a4f-046264076afa/elasticsearch-2.2.0/lib/elasticsearch-2.2.0.jar:/var/lib/mesos/slaves/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3/frameworks/60061f47-c9ea-43bf-bd4e-592d227ce521-0001/executors/elasticsearch_resching-aws-worker-003.node.consul_20160627T124841.934Z/runs/ad4e267f-0566-429b-9a4f-046264076afa/elasticsearch-2.2.0/lib/* org.elasticsearch.bootstrap.Elasticsearch start --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/9c5cad70-5d2a-41a0-aab8-b660828cdf22-S3 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false
# systemd-cgls --no-pager
Working Directory /sys/fs/cgroup/memory:
├─ 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 20
├─600 /sbin/agetty --noclear tty1 linux
├─607 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
├─mesos
│ └─ad4e267f-0566-429b-9a4f-046264076afa
│ ├─7390 /usr/libexec/mesos/mesos-executor
│ └─7407 sh -c mkdir -p /var/lib/mesos/slave/elasticsearch ./.; chown -R nobody /var/lib/mesos/slave/elasticsearch ./.; su -s /bin/sh -c "./elasticsearch-*/bin/elasticsearch --default.http.port=4000 --d...
├─docker
│ ├─6a97761fb9d04708053ab82ae23f3123cb87727e646d960367bd701a5250c139
│ │ └─9820 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Xmx500m -Xss2048k -Djffi.boot.library.pa...
│ ├─0ad157e799ba4b606d4da76766c4ae36dd2a3690b8731961bd2bfd68773df90a
│ │ ├─7716 /bin/bash /launch.sh
│ │ └─7728 /bin/mantl-api
│ ├─fa9c031723bf39042fe40ebb7bb3241931ff5ba48b9b5f55dfb898ede5c47a30
│ │ └─22466 /bin/mesos-consul --zk=zk://resching-aws-control-03:2181,resching-aws-control-02:2181,resching-aws-control-01:2181/mesos --consul-ssl --consul-ssl-verify=false --mesos-ip-order=mesos,host --ref...
│ ├─c055c7056ff0e12a716fd4476557093d93f19eba6ad06c06170e7ca7100a8c33
│ │ ├─21521 /sbin/runsvdir -P /etc/service
│ │ ├─21576 runsv bird
│ │ ├─21577 runsv bird6
│ │ ├─21578 runsv confd
│ │ ├─21579 runsv felix
│ │ ├─21580 svlogd -tt /var/log/calico/bird
│ │ ├─21581 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
│ │ ├─21583 svlogd /var/log/calico/felix
│ │ ├─21584 svlogd /var/log/calico/confd
│ │ ├─21585 /usr/bin/python /usr/bin/calico-felix
│ │ ├─21586 svlogd -tt /var/log/calico/bird6
│ │ ├─21588 confd -confdir=/etc/calico/confd -interval=5 -watch --log-level=debug -node=http://10.1.3.77:2379 -client-key= -client-cert= -client-ca-keys=
│ │ ├─21590 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
│ │ └─21609 /usr/bin/python -m calico.etcddriver /run/felix-driver.sck
│ ├─ac9cf2f5d5dee83cf405a4d9aeb0326b2ab6d0e02f986a7ec4c4f623fb764865
│ │ └─18578 /skydns --addr=0.0.0.0:53
│ └─31669e2b12cec745daf0a21b5eb202799b491ae56e97df25e1277e6527493d46
│ ├─18351 /bin/bash /scripts/launch.sh
│ ├─18362 consul-template -log-level warn -config /consul-template/config.d
│ ├─18363 tail -f /dev/null
│ ├─18379 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
│ └─18380 nginx: worker process
├─user.slice [35/1994]
│ ├─ 7410 su -s /bin/sh -c ./elasticsearch-*/bin/elasticsearch --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --...
│ ├─ 7411 /bin/java -Xms384m -Xmx384m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryEr...
│ ├─13586 sshd: centos [priv]
│ ├─13589 sshd: centos@pts/0
│ ├─13590 -bash
│ ├─13610 sshd: centos [priv]
│ ├─13613 sshd: centos@pts/1
│ ├─13614 -bash
│ ├─13633 sudo journalctl -lfu mesos-agent
│ ├─13634 journalctl -lfu mesos-agent
│ ├─14154 sudo su
│ ├─14155 su
│ ├─14156 bash
│ └─17775 systemd-cgls --no-pager
...
# pwd
/sys/fs/cgroup/memory/mesos/ad4e267f-0566-429b-9a4f-046264076afa
# cat cgroup.procs
7390
7407
# cat memory.limit_in_bytes
570425344
# pwd
/sys/fs/cgroup/memory/user.slice
# cat cgroup.procs
7410
7411
13586
13589
13590
13610
13613
13614
13633
13634
14154
14155
14156
19667
# cat memory.limit_in_bytes
9223372036854775807 |
@ryane Thanks for the debugging. This is interesting, since we simply start the jar inside a mesos command. I would have expected that Mesos would ensure that started processes have the correct parent I suspect adding an It would be massively appreciated if you can try adding an elasticsearch/scheduler/src/main/java/org/apache/mesos/elasticsearch/scheduler/Configuration.java Line 292 in f8e405d
Thanks. |
I'd be glad to but I am unable to build the project (either on master or the 1.0.1 tag) following the instructions at http://mesos-elasticsearch.readthedocs.io/en/latest/. I get various scheduler test failures. Do you have updated build instructions? |
Those commands are trying to run elasticsearch on minimesos. I haven't tried it in a while. A |
I was able to build the image but, unfortunately, using |
When running under JAR mode, the elasticsearch nodes do not respect the memory limit specified by
--elasticsearchRam
. Mesos tracks the initially launched process (24037
, in the example below) and always reports memory usage of 3-4 MBs. The actual elasticsearch process does not appear to be monitored by mesos and memory usage (cpu too, probably) can exceed the specified limit.# sudo ps aux | grep -i elastic root 10844 0.0 0.0 112648 992 pts/0 S+ 21:21 0:00 grep --color=auto -i elastic root 24037 0.0 0.0 115244 1484 ? Ss 19:48 0:00 sh -c mkdir -p /var/lib/mesos/slave/elasticsearch ./.; chown -R nobody /var/lib/mesos/slave/elasticsearch ./.; su -s /bin/sh -c "./elasticsearch-*/bin/elasticsearch --default.discovery.zen.ping.unicast.hosts resching-aws-worker-002:4001 --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/35472449-fe61-4c10-9d10-fa76c194dcd1-S2 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false" nobody root 24040 0.0 0.0 185736 2408 ? S 19:48 0:00 su -s /bin/sh -c ./elasticsearch-*/bin/elasticsearch --default.discovery.zen.ping.unicast.hosts resching-aws-worker-002:4001 --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/35472449-fe61-4c10-9d10-fa76c194dcd1-S2 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false nobody nobody 24041 68.4 4.7 5067052 715864 ? Ssl 19:48 63:43 /bin/java -Xms384m -Xmx384m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/var/lib/mesos/slaves/35472449-fe61-4c10-9d10-fa76c194dcd1-S2/frameworks/548eda30-0e65-4fa8-9ab9-d80618ce28f8-0002/executors/elasticsearch_resching-aws-worker-003.node.consul_20160624T194817.615Z/runs/f7dee160-2292-4e8f-a795-13f5e2e51301/elasticsearch-2.2.0 -cp /var/lib/mesos/slaves/35472449-fe61-4c10-9d10-fa76c194dcd1-S2/frameworks/548eda30-0e65-4fa8-9ab9-d80618ce28f8-0002/executors/elasticsearch_resching-aws-worker-003.node.consul_20160624T194817.615Z/runs/f7dee160-2292-4e8f-a795-13f5e2e51301/elasticsearch-2.2.0/lib/elasticsearch-2.2.0.jar:/var/lib/mesos/slaves/35472449-fe61-4c10-9d10-fa76c194dcd1-S2/frameworks/548eda30-0e65-4fa8-9ab9-d80618ce28f8-0002/executors/elasticsearch_resching-aws-worker-003.node.consul_20160624T194817.615Z/runs/f7dee160-2292-4e8f-a795-13f5e2e51301/elasticsearch-2.2.0/lib/* org.elasticsearch.bootstrap.Elasticsearch start --default.discovery.zen.ping.unicast.hosts resching-aws-worker-002:4001 --default.http.port=4000 --default.transport.tcp.port=4001 --default.cluster.name=mantl --default.node.master=true --default.node.data=true --default.node.local=false --default.index.number_of_replicas=0 --default.index.auto_expand_replicas=0-all --path.home=././es_home --default.path.data=/var/lib/mesos/slave/elasticsearch/mantl/35472449-fe61-4c10-9d10-fa76c194dcd1-S2 --path.conf=./. --default.bootstrap.mlockall=true --default.network.bind_host=0.0.0.0 --default.network.publish_host=_non_loopback:ipv4_ --default.gateway.recover_after_nodes=1 --default.gateway.expected_nodes=1 --default.indices.recovery.max_bytes_per_sec=100mb --default.discovery.type=zen --default.discovery.zen.fd.ping_timeout=30s --default.discovery.zen.fd.ping_interval=1s --default.discovery.zen.fd.ping_retries=30 --default.discovery.zen.ping.multicast.enabled=false
The text was updated successfully, but these errors were encountered: