Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Slurm ElasticSearch logging playbook for log4shell #1079

Merged
merged 4 commits into from
Jan 12, 2022

Conversation

ajdecon
Copy link
Collaborator

@ajdecon ajdecon commented Dec 13, 2021

Summary

Note that mitigations are applied for ElasticSearch and Logstash.
Kibana and Filebeat are confirmed to not be impacted.

Test plan

Successful execution of the logging.yml playbook

Summary
-------

- Update Ansible Galaxy requirements to use a different set of roles
(due to the old ones not working)

- Update logging.yml playbook to make use of the new Galaxy roles

- Add mitigations for CVE-2021-44228 as documented in
https://discuss.elastic.co/t/apache-log4j2-remote-code-execution-rce-vulnerability-cve-2021-44228-esa-2021-31/291476

Note that mitigations are applied for ElasticSearch and Logstash.
Kibana and Filebeat are confirmed to not be impacted.

Test plan
---------

Successful execution of the logging.yml playbook
@ajdecon ajdecon added the next-release Critical for the next release label Jan 5, 2022
Copy link
Contributor

@dholt dholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging playbook is getting hung up for me during the restart-logstash handler. For some reason the logstash process is refusing to shut down:

Jan  7 19:32:10 virtual-login01 logstash[109965]: [2022-01-07T19:32:10,365][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>32, "name"=>"[main]>worker0", "cur
rent_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}, {"thread_id"=>33, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:9
5:in `sleep'"}, {"thread_id"=>34, "name"=>"[main]>worker2", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}, {"thread_id"=>35, "name"=>"[main]>worker3", "current_call"=>"[...]/v
endor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}]}}

When I logged on to the machine and manually killed the logstash process, the playbook continued successfully.

There are also some errors related to licensing and not being able to connect to Elasticsearch (which is running):

Jan  7 19:26:10 virtual-login01 logstash[109965]: [2022-01-07T19:26:10,379][ERROR][logstash.outputs.elasticsearch][main] Unable to get license information {:url=>"http://localhost:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting Elasticsearch at URL 'http://localhost:9200/_license'"}
Jan  7 19:26:10 virtual-login01 logstash[109965]: [2022-01-07T19:26:10,380][ERROR][logstash.outputs.elasticsearch][main] Could not connect to a compatible version of Elasticsearch {:url=>"http://localhost:9200/"}
Jan  7 19:26:10 virtual-login01 logstash[109965]: [2022-01-07T19:26:10,379][ERROR][logstash.outputs.elasticsearch][main] Unable to get license information {:url=>"http://localhost:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '503' contacting Elasticsearch at URL 'http://localhost:9200/_license'"}
Jan  7 19:26:10 virtual-login01 logstash[109965]: [2022-01-07T19:26:10,380][ERROR][logstash.outputs.elasticsearch][main] Could not connect to a compatible version of Elasticsearch {:url=>"http://localhost:9200/"}

Did you see anything like this?

@dholt
Copy link
Contributor

dholt commented Jan 7, 2022

I ran into another issue running the playbook a second time, suggesting it's not idempotent:

TASK [configure logstash to mitigate CVE-2021-44228] ***********************************************************************************************************************************************************************************
fatal: [virtual-login01]: FAILED! => changed=true
  cmd: zip -q -d /usr/share/logstash/logstash-core/lib/jars/log4j-core-2.* org/apache/logging/log4j/core/lookup/JndiLookup.class
  delta: '0:00:00.012805'
  end: '2022-01-07 19:40:00.699683'
  msg: non-zero return code
  rc: 12
  start: '2022-01-07 19:40:00.686878'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |2-

    zip error: Nothing to do! (/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.17.0.jar)
  stdout_lines: <omitted>

notify:
- restart-elasticsearch
- name: configure logstash to mitigate CVE-2021-44228
shell: zip -q -d /usr/share/logstash/logstash-core/lib/jars/log4j-core-2.* org/apache/logging/log4j/core/lookup/JndiLookup.class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs something after this line like:

    args:
        creates: /usr/share/logstash/logstash-core/lib/jars/log4j-core-2.17.0.jar

It looks like you can use log4j-core-*.jar as the filename, but not sure if that will pick up false positives

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're actually removing a class from the live zip file here, not creating anything new. Which is pretty ugly, but it's the recommended mitigation from Elastic. /sigh

I'll look into adding a query step here so that we can avoid errors or idempotency issues.

@ajdecon
Copy link
Collaborator Author

ajdecon commented Jan 7, 2022

The logging playbook is getting hung up for me during the restart-logstash handler
There are also some errors related to licensing and not being able to connect to Elasticsearch (which is running):

Hmm, I didn't see these. I'll investigate.

@dholt
Copy link
Contributor

dholt commented Jan 7, 2022

The logging playbook is getting hung up for me during the restart-logstash handler
There are also some errors related to licensing and not being able to connect to Elasticsearch (which is running):

Hmm, I didn't see these. I'll investigate.

It worked fine on a clean run... i'm good considering it a fluke and approving if you can fix the idempotent issue

@ajdecon
Copy link
Collaborator Author

ajdecon commented Jan 7, 2022

It worked fine on a clean run... i'm good considering it a fluke and approving if you can fix the idempotent issue

I also saw this on 1 of 3 runs just now.... 🤔 So I'm not sure it's a fluke, but it's definitely inconsistent. I'll poke around and see if I can come up with a reliable reproducer and prevent it.

Just pushed a proposed fix for idempotency issue.

@ajdecon
Copy link
Collaborator Author

ajdecon commented Jan 11, 2022

@dholt : Let me know when you can re-review.

Weirdly, 8540c4f seems to help here, and subsequent restarts worked well in my tests.

Copy link
Contributor

@dholt dholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working great for me, thanks!

@ajdecon ajdecon merged commit 1f94804 into NVIDIA:master Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next-release Critical for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants