Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ES 2.0 #384

Closed
martinb3 opened this issue Oct 16, 2015 · 42 comments
Closed

Support ES 2.0 #384

martinb3 opened this issue Oct 16, 2015 · 42 comments
Milestone

Comments

@martinb3
Copy link
Contributor

Currently in beta.

@martinb3 martinb3 added this to the 2.0 release milestone Oct 16, 2015
@martinb3 martinb3 added the ready label Oct 22, 2015
@paulschwarz
Copy link

Has this been addressed in 2.0.0_wip? voxpupuli/puppet-elasticsearch#472

@martinb3
Copy link
Contributor Author

Hi @paulschwarz. We moved to using the same init scripts and files as the packages shipped from elasticsearch.org, as of RC1 of ES 2.0. If it's fixed there, we should be fixed as well. We can check that before we release as well, based on the comments in this issue.

@majormoses
Copy link

@martinb3 I was going to try updating it for 2.0 (non RC) but then I notice that we are using sha256 for the checksums I am looking here: https://www.elastic.co/downloads/past-releases/elasticsearch-2-0-0 and they appear to be given as sha1. Is there somewhere else you have been grabbing them from?

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 2, 2015

@majormoses I have been downloading the binaries to calculate them, as Chef's resources are all based on sha256 checksums and not sha1. If we provide a sha1 to chef, it will just re-download over and over. Hope this helps, - Martin

@paulschwarz
Copy link

Is there an ETA on a 2.0.0 (non rc)?

On Mon, 2 Nov 2015, 23:58 Martin Smith notifications@github.com wrote:

@majormoses https://github.com/majormoses I have been downloading the
binaries to calculate them, as Chef's resources are all based on sha256
checksums and not sha1.


Reply to this email directly or view it on GitHub
#384 (comment)
.

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 2, 2015

@paulschwarz I haven't really had any testing feedback yet. If anyone could send feedback on the cookbook's 2.0.0_wip branch, I would be much obliged! :)

It also looks like maybe we should grab the newest init scripts from the packages, since some updates were made in ES 2.0.0.

@majormoses
Copy link

@martinb3 bumping it to 2.0 and going to start testing it

@majormoses
Copy link

I am looking into it but elasticsearch is definitely broken despite converging. I am still looking for root cause. When I try to start the service it says its starting but there is nothing being logged. the process immediately dies. I am starting it manually and see this:

[2015-11-03 02:46:11,817][WARN ][bootstrap                ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory
[2015-11-03 02:46:11,818][WARN ][bootstrap                ] This can result in part of the JVM being swapped out.
[2015-11-03 02:46:11,818][WARN ][bootstrap                ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2015-11-03 02:46:11,818][WARN ][bootstrap                ] These can be adjusted by modifying /etc/security/limits.conf, for example: 
    # allow user 'elasticsearch' mlockall
    elasticsearch soft memlock unlimited
    elasticsearch hard memlock unlimited
[2015-11-03 02:46:11,818][WARN ][bootstrap                ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
[2015-11-03 02:46:12,157][INFO ][node                     ] [ip-10-55-150-146.us-west-2.compute.internal] version[2.0.0], pid[1657], build[de54438/2015-10-22T08:09:48Z]
[2015-11-03 02:46:12,157][INFO ][node                     ] [ip-10-55-150-146.us-west-2.compute.internal] initializing ...
[2015-11-03 02:46:12,563][INFO ][plugins                  ] [ip-10-55-150-146.us-west-2.compute.internal] loaded [license], sites [kopf, head]
[2015-11-03 02:46:14,503][INFO ][node                     ] [ip-10-55-150-146.us-west-2.compute.internal] initialized
[2015-11-03 02:46:14,504][INFO ][node                     ] [ip-10-55-150-146.us-west-2.compute.internal] starting ...
[2015-11-03 02:46:14,569][INFO ][transport                ] [ip-10-55-150-146.us-west-2.compute.internal] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2015-11-03 02:46:14,576][INFO ][discovery                ] [ip-10-55-150-146.us-west-2.compute.internal] es-app/8JSZS_15QEO2oEDXBbHrUw
[2015-11-03 02:46:14,674][WARN ][transport.netty          ] [ip-10-55-150-146.us-west-2.compute.internal] exception caught on transport layer [[id: 0x14ef5bf6, /10.55.150.146:40704 => /10.55.150.125:9300]], closing connection
java.lang.NullPointerException
    at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:206)
    at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:201)
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:136)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

I have manually attempted to set the follwoing in /etc/security/limits.conf and rebooting and verified it was was working:

elasticsearch    soft    nofile      9000
elasticsearch    hard    nofile      650000 
elasticsearch    soft    memlock     unlimited
elasticsearch    hard    memlock     unlimited
elasticsearch@ip-10-55-150-146:/$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63794
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 9000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63794
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 3, 2015

We'll need to look into what the packages do; the cookbook used to set limits, but we saw the 1.x packages weren't; in an effort to match parity between the packages and chef, we removed it.

@majormoses
Copy link

@martinb3 I spun up a brand new cluster to test with as opposed to an upgrade (so there apparently is not a nice upgrade path even if you run the plugin first). I am able to get the process to launch without errors manually but the init scripts are still failing with no logging. I will report back what I find.

babrams@ip-10-55-150-212:~$ sudo -u elasticsearch /usr/local/elasticsearch/bin/elasticsearch --path.conf=/usr/local/etc/elasticsearch/
[2015-11-03 17:19:57,396][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] version[2.0.0], pid[31686], build[de54438/2015-10-22T08:09:48Z]
[2015-11-03 17:19:57,397][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] initializing ...
[2015-11-03 17:19:58,454][INFO ][plugins                  ] [ip-10-55-150-212.us-west-2.compute.internal] loaded [], sites [kopf, head]
[2015-11-03 17:19:58,947][INFO ][env                      ] [ip-10-55-150-212.us-west-2.compute.internal] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [4.9gb], net total_space [7.7gb], spins? [no], types [ext4]
[2015-11-03 17:20:07,701][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] initialized
[2015-11-03 17:20:07,705][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] starting ...
[2015-11-03 17:20:08,063][INFO ][transport                ] [ip-10-55-150-212.us-west-2.compute.internal] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2015-11-03 17:20:08,090][INFO ][discovery                ] [ip-10-55-150-212.us-west-2.compute.internal] elasticsearch/TbDgWQiYQRG-ijnwI2bFtA
[2015-11-03 17:20:11,188][INFO ][cluster.service          ] [ip-10-55-150-212.us-west-2.compute.internal] new_master {ip-10-55-150-212.us-west-2.compute.internal}{TbDgWQiYQRG-ijnwI2bFtA}{127.0.0.1}{127.0.0.1:9300}{max_local_storage_nodes=1}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-11-03 17:20:11,254][INFO ][http                     ] [ip-10-55-150-212.us-west-2.compute.internal] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}
[2015-11-03 17:20:11,254][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] started
[2015-11-03 17:20:11,270][INFO ][gateway                  ] [ip-10-55-150-212.us-west-2.compute.internal] recovered [0] indices into cluster_state

@majormoses
Copy link

@martinb3 also I forgot to mention but I bumped my java to 8 where previously it was 7.

@majormoses
Copy link

interesting I am now seeing no route to host errors even though I can ping myself just fine:

[2015-11-03 18:08:44,128][WARN ][bootstrap                ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory
[2015-11-03 18:08:44,130][WARN ][bootstrap                ] This can result in part of the JVM being swapped out.
[2015-11-03 18:08:44,143][WARN ][bootstrap                ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2015-11-03 18:08:44,143][WARN ][bootstrap                ] These can be adjusted by modifying /etc/security/limits.conf, for example: 
    # allow user 'elasticsearch' mlockall
    elasticsearch soft memlock unlimited
    elasticsearch hard memlock unlimited
[2015-11-03 18:08:44,146][WARN ][bootstrap                ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
[2015-11-03 18:08:45,845][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] version[2.0.0], pid[15171], build[de54438/2015-10-22T08:09:48Z]
[2015-11-03 18:08:45,863][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] initializing ...
[2015-11-03 18:08:48,666][INFO ][plugins                  ] [ip-10-55-150-212.us-west-2.compute.internal] loaded [], sites [kopf, head]
[2015-11-03 18:08:49,353][INFO ][env                      ] [ip-10-55-150-212.us-west-2.compute.internal] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [4.8gb], net total_space [7.7gb], spins? [no], types [ext4]
[2015-11-03 18:09:05,902][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] initialized
[2015-11-03 18:09:05,951][INFO ][node                     ] [ip-10-55-150-212.us-west-2.compute.internal] starting ...
[2015-11-03 18:09:06,839][INFO ][transport                ] [ip-10-55-150-212.us-west-2.compute.internal] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2015-11-03 18:09:06,863][INFO ][discovery                ] [ip-10-55-150-212.us-west-2.compute.internal] es-app/qj-YdGluTGGRTSLxhTC-bg
[2015-11-03 18:09:10,648][WARN ][transport.netty          ] [ip-10-55-150-212.us-west-2.compute.internal] exception caught on transport layer [[id: 0x67cac6a5]], closing connection
java.net.NoRouteToHostException: No route to host
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

dns resolution and basic connectivity;

babrams@ip-10-55-150-212:~$ ping -c 1 ip-10-55-150-212.us-west-2.compute.internal
PING ip-10-55-150-212.us-west-2.compute.internal (10.55.150.212) 56(84) bytes of data.
64 bytes from ip-10-55-150-212.us-west-2.compute.internal (10.55.150.212): icmp_seq=1 ttl=64 time=0.012 ms

--- ip-10-55-150-212.us-west-2.compute.internal ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.012/0.012/0.012/0.000 ms

@majormoses
Copy link

my config that chef outputs: http://pastebin.com/WCxHkRi1

@majormoses
Copy link

@martinb3 Found why the init script is broken: elastic/elasticsearch#13772

@majormoses
Copy link

I will see about fixing it

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 3, 2015

Hi folks -- I have a pretty big changeset going I'm about to push as well, to 2.0.0_wip.

martinb3 added a commit that referenced this issue Nov 3, 2015
Add new checksums and default to v2.0.0 of Elasticsearch. RE: #384.
@martinb3
Copy link
Contributor Author

martinb3 commented Nov 3, 2015

Changes pushed to the 2.0.0_wip branch. I think I've fixed a few bugs. Please give the latest commits a try! :)

@paulschwarz
Copy link

I'll give feedback tomorrow morning, thanks for the effort!

On Tue, 3 Nov 2015, 22:54 Martin Smith notifications@github.com wrote:

Changes pushed to the 2.0.0_wip branch. I think I've fixed a few bugs.
Please give the latest commits a try! :)


Reply to this email directly or view it on GitHub
#384 (comment)
.

@majormoses
Copy link

I have a quick and dirty non backwards compatible solution for the init script. I will work on getting it not so terrible...

@majormoses
Copy link

@martinb3 I have tested and the service starts though I am unable to get my cluster up.

@majormoses
Copy link

I dont see any communication between the nodes in the logs

@majormoses
Copy link

@martinb3 I got a cluster up. Two more settings I needed set that were new from previous versions of es:

'network.publish_host' => '_non_loopback:ipv4_',
'network.bind_host' => '_non_loopback:ipv4_',

They changed ES to only listen by default on local loopback. I am torn on whether those should be default as I would imagine 90% of people will want. But ES felt that they should set a new default so maybe I am missing something.

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 4, 2015

@majormoses I hadn't run across that one yet. For the most part, I think the caller of the Chef resources should probably set those (using Chef search $clustering_solution, or w/e else they prefer) and we should just implement them. What do you think about deferring that as much as possible? I'm going for least surprise where-ever possible.

@majormoses
Copy link

@martinb3 maybe I am missing something but this is just setting the ES process binds and publish. Currently if we dont set it ES will default bind to 127.0.0.1 and ::1, I dont think is what most people expect, especially since in previous versions of ES I believe it bound to all interfaces. From your comment on chef search I think you may be referring to the unicast host list which yes I implement in my wrapper via a chef search via roles.

Here is the doc that goes over what I am talking about: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html

I am thinking that adding them to default attributes in here: https://github.com/elastic/cookbook-elasticsearch/blob/master/libraries/resource_configure.rb#L49-L67

and then it can be overwritable via: https://github.com/elastic/cookbook-elasticsearch/blob/master/libraries/resource_configure.rb#L72

would make sense to me, it was a gotcha for me to have to shell into the node to realize that it wasnt bound by the other interfaces like I expected.

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 4, 2015

@majormoses I think most people expect the Chef cookbook to not fundamentally alter the defaults for Elasticsearch. This was one of the most common complaints before we re-wrote it to be nearer to the deb/rpm/package behaviors.

If the Elasticsearch 2.0.0 defaults have significantly changed from 1.x to 2.x, I'm not sure the cookbook should change that for users. Most users expect us to default to the same defaults as a vanilla/package ES install. I'm not sure we know better than the Elasticsearch project what the defaults should be.

If we adjust the ES defaults, we're likely to see people file issues here, such as, "I only have one node, Why are you changing the bind address from the ES default. I wasn't expecting that and now my ES instance in compromised."

I feel like the ES defaults are both the most secure and the least surprise for folks coming to this cold. I feel like we can offer documentation on how to change things, for those who want to do so, to address all of the possible use cases.

@karmi
Copy link
Contributor

karmi commented Nov 4, 2015

Thanks for the summary, @martinb3, I agree on the approach.

@majormoses
Copy link

@martinb3 fair enough. do you want to document it or would you like a PR?

@martinb3
Copy link
Contributor Author

martinb3 commented Nov 4, 2015

@majormoses I plan to document more, before the release, but I'd also welcome any PRs if you think there should be some specific language I shouldn't miss. Did everything else with the branch go okay?

@majormoses
Copy link

@martinb3 overall everything is standing up and working as far as I see. I haven't hooked an app up to it yet so I may uncover more later but I think we are probably good. I have not tested it since with an upgrade of a previous 1.x system so we should definitely do some testing there or at least call it out....

@paulschwarz
Copy link

I've tried it out with a very minimal configuration on Ubuntu and it works. It installed 2.0.0 using the tarball. Great!

@dvinograd
Copy link

Great job guys . I was able to install ES 2.0.0 using this cookbook on Amazon Linux AMI release 2014.09 , with one minor fix - templates/amazon/initscript.erb is missing and there's no templates/default/initscript.erb so I had to copy templates/redhat/initscript.erb to templates/amazon/initscript.erb.

@martinb3
Copy link
Contributor Author

@dvinograd that's great feedback; exactly what I was looking for; thank you!

@martinb3
Copy link
Contributor Author

@dvinograd I've added the amazon platform template to the 2.0.0_wip branch.

@martinb3
Copy link
Contributor Author

Does anyone have other comments on the 2.0.0_wip branch? I'd like to write a blog post to address #293 before we release.

@majormoses
Copy link

I cant think of anything else

@spuder
Copy link
Contributor

spuder commented Nov 20, 2015

I'll put in a vote for some updated examples in the readme

related #389

It would be nice if there were more examples guiding people down the packages route. As it stands, it is easy for a user to infer that the tarball installation is the recommend path.

I'm willing to submit a pull request with what I think is a simple and clean recipe once I get my new cluster deployed.

@martinb3
Copy link
Contributor Author

Hi all -- I've pushed some final updates to 2.0.0_wip, including some changes I'd love your feedback on before we do a final release. Specifically, changes include:

  • default to package install
  • plugin resource accepts URL parameter, to make it truly idempotent by plugin name
  • pass through Chef::Resource::Service actions from elasticsearch_service to service (enables notifications!)
  • lots of documentation updates

spuder added a commit to spuder/cookbook-elasticsearch that referenced this issue Nov 22, 2015
According to this comment, packages are now the default install

sous-chefs#384 (comment)
@martinb3
Copy link
Contributor Author

Version 2.0.0 of this cookbook has been released.

@martinb3
Copy link
Contributor Author

Thank you all for your help!!

@paulschwarz
Copy link

Thanks Martin

On Mon, 23 Nov 2015, 20:11 Martin Smith notifications@github.com wrote:

Thank you all for your help!!


Reply to this email directly or view it on GitHub
#384 (comment)
.

@majormoses
Copy link

thanks, glad to be of help.

@davidski
Copy link

💃 💃 💃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants