Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splunk.service: Refusing to accept PID outside of service control group #185

Closed
JonoRicci opened this issue Oct 15, 2020 · 8 comments
Closed

Comments

@JonoRicci
Copy link

JonoRicci commented Oct 15, 2020

Myself and @jjm have encountered the following problem and would be very grateful for any assistance.

Expected Behaviour

I want to install the splunk universal forwarder in my AWS EC2 environment.

I am using a wrapper cookbook which only determines the host OS and passes through a private appropriate installation URL to the chef-splunk cookbook. In my wrapper cookbook I am calling the chef-splunk::client recipe directly.

Actual Behaviour

On Ubuntu 16.04, 18.04 and 20.04 (using the latest images via the ec2-driver) my Kitchen Test in my wrapper cookbook fails to converge.

Below is the error output from Ubuntu 20.04.

Error output
Recipe: chef-splunk::client
        * execute[/opt/splunkforwarder/bin/splunk stop] action run (skipped due to not_if)
      Recipe: chef-splunk::service
        * service[splunk] action start

          ================================================================================
          Error executing action `start` on resource 'service[splunk]'
          ================================================================================

          Mixlib::ShellOut::ShellCommandFailed
          ------------------------------------
          Expected process to exit with [0], but received '1'
          ---- Begin output of /usr/bin/systemctl --system start splunk ----
          STDOUT:
          STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration.
          See "systemctl status splunk.service" and "journalctl -xe" for details.
          ---- End output of /usr/bin/systemctl --system start splunk ----
          Ran /usr/bin/systemctl --system start splunk returned 1

          Resource Declaration:
          ---------------------
          # In /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb

          116: service 'splunk' do
          117:   action node['init_package'] == 'systemd' ? %i(start enable) : :start
          118:   supports status: true, restart: true
          119:   notifies :run, "execute[#{splunk_cmd} stop]", :before unless correct_runas_user?
          120:   provider splunk_service_provider
          121: end
          122:

          Compiled Resource:
          ------------------
          # Declared in /tmp/kitchen/cache/cookbooks/chef-splunk/recipes/service.rb:116:in `from_file'

          service("splunk") do
            provider Chef::Provider::Service::Systemd
            action [:start, :enable]
            updated true
            default_guard_interpreter :default
            service_name "splunk"
            enabled true
            running true
            masked false
            pattern "splunk"
            declared_type :service
            cookbook_name "chef-splunk"
            recipe_name "service"
            supports {:status=>true, :restart=>true}
          end

          System Info:
          ------------
          chef_version=14.15.6
          platform=ubuntu
          platform_version=20.04
          ruby=ruby 2.5.8p224 (2020-03-31 revision 67882) [x86_64-linux]
          program_name=/opt/chef/bin/chef-client
          executable=/opt/chef/bin/chef-client


      Running handlers:
      [2020-10-15T09:42:36+00:00] ERROR: Running exception handlers
      Running handlers complete
      [2020-10-15T09:42:36+00:00] ERROR: Exception handlers complete
      Chef Client failed. 3 resources updated in 01 seconds
      [2020-10-15T09:42:36+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out
      [2020-10-15T09:42:36+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
      [2020-10-15T09:42:36+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
      ---- Begin output of /usr/bin/systemctl --system start splunk ----
      STDOUT:
      STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration.
      See "systemctl status splunk.service" and "journalctl -xe" for details.
      ---- End output of /usr/bin/systemctl --system start splunk ----
      Ran /usr/bin/systemctl --system start splunk returned 1
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Converge failed on instance <client-ubuntu-2004>.  Please see .kitchen/logs/client-ubuntu-2004.log for more details
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

Further investigation reveals:

systemctl status splunk.service
ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service
● splunk.service - Splunk
     Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled)
     Active: failed (Result: protocol) since Thu 2020-10-15 09:42:36 UTC; 1h 15min ago
    Process: 3603 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS)

Oct 15 09:42:35 ip-10-0-0-47 systemd[1]: Starting Splunk...
Oct 15 09:42:36 ip-10-0-0-47 splunk[3603]: The splunk daemon (splunkd) is already running.
Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid
Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder/var/run/splunk/splunkd.pid
Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: splunk.service: Failed with result 'protocol'.
Oct 15 09:42:36 ip-10-0-0-47 systemd[1]: Failed to start Splunk.

Details

Workaround

I have a manual workaround:

  1. kitchen login
  2. kill the splunk process.
  3. service splunk start

This successfully launches the splunk service:

Shell output
ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk
root        2974       1  0 09:31 ?        00:00:08 splunkd -p 8089 restart
root        2975    2974  0 09:31 ?        00:00:00 [splunkd pid=2974] splunkd -p 8089 restart [process-runner]
ubuntu      3859    3787  0 11:14 pts/0    00:00:00 grep --color=auto splunk
ubuntu@ip-10-0-0-47:~$ sudo kill -9 2974
ubuntu@ip-10-0-0-47:~$ ps -ef | grep splunk
ubuntu      3885    3787  0 11:14 pts/0    00:00:00 grep --color=auto splunk
ubuntu@ip-10-0-0-47:~$ sudo service splunk start
ubuntu@ip-10-0-0-47:~$ systemctl status splunk.service
● splunk.service - Splunk
     Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-10-15 11:14:37 UTC; 10s ago
    Process: 3895 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS)
   Main PID: 3966 (splunkd)
      Tasks: 40 (limit: 4710)
     Memory: 52.2M
     CGroup: /system.slice/splunk.service
             ├─3966 splunkd -p 8089 start
             └─3967 [splunkd pid=3966] splunkd -p 8089 start [process-runner]

Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]:         Done
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]:         Checking default conf files for edits...
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]:         Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-8.0.6-152fb4b2bb96-linux-2.6-x86_64-manifest'
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]:         All installed files intact.
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]:         Done
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: All preliminary checks passed.
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Starting splunk server daemon (splunkd)...
Oct 15 11:14:37 ip-10-0-0-47 splunk[3895]: Done
Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument
Oct 15 11:14:37 ip-10-0-0-47 systemd[1]: Started Splunk.

You will notice the splunk.service: Failed to parse PID from file /opt/splunkforwarder/var/run/splunk/splunkd.pid: Invalid argument is still present even on a successful start.

This leads me to be unsure whether the PID is the root error or a red herring in this case.

Stack trace

[2020-10-15T09:42:36+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out
[2020-10-15T09:42:36+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
Stack trace
Generated at 2020-10-15 13:48:48 +0000
Mixlib::ShellOut::ShellCommandFailed: service[splunk] (chef-splunk::service line 116) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/systemctl --system start splunk ----
STDOUT:
STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status splunk.service" and "journalctl -xe" for details.
---- End output of /usr/bin/systemctl --system start splunk ----
Ran /usr/bin/systemctl --system start splunk returned 1
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:124:in `shell_out!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service/systemd.rb:106:in `start_service'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:135:in `block in action_start'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/why_run.rb:51:in `add_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:227:in `converge_by'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:134:in `action_start'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:182:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource.rb:578:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:74:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `block in run_all_actions'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `each'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `run_all_actions'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:132:in `block in converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:130:in `converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:720:in `block in converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `catch'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:754:in `converge_and_save'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:286:in `run'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:303:in `run_with_graceful_exit_option'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:279:in `block in run_chef_client'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/local_mode.rb:44:in `with_server_connectivity'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:261:in `run_chef_client'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application/client.rb:449:in `run_application'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:66:in `run'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/bin/chef-client:25:in `<top (required)>'
/opt/chef/bin/chef-client:81:in `load'
/opt/chef/bin/chef-client:81:in `<main>'

>>>> Caused by Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/systemctl --system start splunk ----
STDOUT:
STDERR: Job for splunk.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status splunk.service" and "journalctl -xe" for details.
---- End output of /usr/bin/systemctl --system start splunk ----
Ran /usr/bin/systemctl --system start splunk returned 1
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:297:in `invalid!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/mixlib-shellout-2.4.4/lib/mixlib/shellout.rb:284:in `error!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:202:in `shell_out_compacted!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/shell_out.rb:124:in `shell_out!'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service/systemd.rb:106:in `start_service'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:135:in `block in action_start'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/mixin/why_run.rb:51:in `add_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:227:in `converge_by'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider/service.rb:134:in `action_start'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/provider.rb:182:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource.rb:578:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:74:in `run_action'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `block in run_all_actions'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `each'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:108:in `run_all_actions'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:132:in `block in converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:94:in `block in execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:114:in `call_iterator_block'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:103:in `iterate'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/resource_collection/resource_list.rb:92:in `execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/runner.rb:130:in `converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:720:in `block in converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `catch'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:715:in `converge'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:754:in `converge_and_save'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/client.rb:286:in `run'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:303:in `run_with_graceful_exit_option'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:279:in `block in run_chef_client'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/local_mode.rb:44:in `with_server_connectivity'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:261:in `run_chef_client'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application/client.rb:449:in `run_application'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/lib/chef/application.rb:66:in `run'
/opt/chef/embedded/lib/ruby/gems/2.5.0/gems/chef-14.15.6/bin/chef-client:25:in `<top (required)>'
/opt/chef/bin/chef-client:81:in `load'
@JonoRicci
Copy link
Author

JonoRicci commented Oct 15, 2020

Reproducing the error with Chef-Splunk Kitchen test with EC2-driver

I can run the chef-splunk kitchen test with dokken successfully without reproducing the error.

If I swap the dokken driver with the ec2 driver and add a very simple Inspec test I can reproduce our error in the chef-splunk cookbook. (I ran the Inspec test in the dokken driver which resulted with the same outcome).

The Inspec test:

describe service('splunk') do
  it { should be_installed }
  it { should be_enabled }
  it { should be_running }
end

The result:

  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ×  is expected to be running
     expected that `Service splunk` is running

Investigating on the instance:

Shell output
ubuntu@ip-10-0-0-150:~$ systemctl status splunk.service
● splunk.service - Splunk
     Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
ubuntu@ip-10-0-0-150:~$ sudo systemctl start splunk.service
Job for splunk.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status splunk.service" and "journalctl -xe" for details.
ubuntu@ip-10-0-0-150:~$ systemctl status splunk.service
● splunk.service - Splunk
     Loaded: loaded (/etc/systemd/system/splunk.service; enabled; vendor preset: enabled)
     Active: failed (Result: protocol) since Thu 2020-10-15 13:33:28 UTC; 2s ago
    Process: 3266 ExecStart=/opt/splunkforwarder/bin/splunk start --answer-yes --no-prompt (code=exited, status=0/SUCCESS)

Oct 15 13:33:27 ip-10-0-0-150 systemd[1]: Starting Splunk...
Oct 15 13:33:28 ip-10-0-0-150 splunk[3266]: The splunk daemon (splunkd) is already running.
Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder>
Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /opt/splunkforwarder>
Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: splunk.service: Failed with result 'protocol'.
Oct 15 13:33:28 ip-10-0-0-150 systemd[1]: Failed to start Splunk.

kitchen.yml

kitchen.yml
---
driver:
  name: ec2
  region: eu-west-1
  interface: public
  instance_type: t2.medium
  require_chef_omnibus: true
  subnet_filter:
    ...
  security_group_filter:
    ...
  tags:
    ...

transport:
  max_threads: 5
  connection_timeout: 10
  connection_retries: 36
  connection_retry_sleep: 10
  max_wait_until_ready: 1200

provisioner:
  name: chef_zero
  log_level: auto
  product_name: chef
  product_version: 14
  max_retries: 3
  wait_for_retry: 90
  retry_on_exit_code:
    - 35 # chef-client's reboot scheduled exit status
  chef_license: accept
  attributes:
    dev_mode: true
    splunk:
      accept_license: true
      enable_ssl: false
      ssl_options:
        enableSplunkWebSSL: 0
        httpport: 8000
        startwebserver: 1
      web_port: 8000

verifier:
  name: inspec
  sudo: true
  root_path: '/opt/verifier'

platforms:
  - name: ubuntu-2004
    driver:
      image_search:
        owner-id: "099720109477"
        name: ubuntu/images/*/ubuntu-*-20.04*
  - name: ubuntu-1804
    driver:
      image_search:
        owner-id: "099720109477"
        name: ubuntu/images/*/ubuntu-*-18.04*
  - name: ubuntu-1604
    driver:
      image_search:
        owner-id: "099720109477"
        name: ubuntu/images/*/ubuntu-*-16.04*

suites:
  - name: client
    run_list:
      - recipe[chef-splunk::default]
    attributes:
      dev_mode: true
      splunk:
        accept_license: true
    verifier:
      inspec_tests:
        - path: test/integration/default

@haidangwa
Copy link
Contributor

@JonoRicci can you show what your systemd unit file looks like? If you're calling the client recipe directly, you may not be setting up the splunk auth attributes. There is logic in the default recipe that reads the splunk admin user/pass from a data bag or from chef-vault.

@haidangwa
Copy link
Contributor

you need to have this in an encrypted data bag or chef-vault item:

vault_item = chef_vault_item(node['splunk']['data_bag'], "splunk_#{node.chef_environment}")

@jjm
Copy link
Contributor

jjm commented Oct 15, 2020

Hi @haidangwa, I've created PR #186 that adds some inspec tests to chef-splunk for the client suite that shows the issue we are seeing without our wrapper cookbook.

The output of verify on ubuntu-2004 is as follows:

  System Package splunkforwarder
     ✔  should be installed
  Service splunk
     ✔  should be installed
     ✔  should be enabled
     ×  should be running
     expected that `Service splunk` is running
  Port 8089
     ✔  should be listening
     ✔  protocols should include "tcp"
  Processes splunkd
     ✔  should exist

Test Summary: 6 successful, 1 failure, 0 skipped

To me the cause seems with the starting of splunk to accept the license, if I login to the docker container and stop splunk with /opt/splunkforwarder/bin/splunk stop and service splunk start. All the tests pass

  System Package splunkforwarder
     ✔  should be installed
  Service splunk
     ✔  should be installed
     ✔  should be enabled
     ✔  should be running
  Port 8089
     ✔  should be listening
     ✔  protocols should include "tcp"
  Processes splunkd
     ✔  should exist

Test Summary: 7 successful, 0 failures, 0 skipped

Edit: Made it clearer we see these issues directly with chef-splunk and added summary to the verify output.

@haidangwa
Copy link
Contributor

@jjm Have you accepted the license? There is one way and only one way to accept the license: https://github.com/chef-cookbooks/chef-splunk#license-acceptance

@jjm
Copy link
Contributor

jjm commented Oct 15, 2020

@haidangwa Yes, it's done by this line of the kitchen.yml file:

https://github.com/chef-cookbooks/chef-splunk/blob/98a95a26472f8e04cfef207bc50276154f068d71/kitchen.yml#L18

EDIT: Linked to chef license acceptance, not splunk.

@jjm
Copy link
Contributor

jjm commented Oct 15, 2020

haidangwa added a commit that referenced this issue Oct 17, 2020
  * a startup issue was resolved for SplunkForwarder installations with an improved
    systemd unit file (fix below)
  * Adds Inspec tests to verify from SplunkForwarder starts (thanks, @jjm)
- Fixes Issue [#187](#187)
  * the systemd unit file is now relegated to the `splunk enable boot-start` command to manage
- Adds Inspec tests and sets the verifier in Test Kitchen

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>
@haidangwa haidangwa mentioned this issue Oct 17, 2020
4 tasks
haidangwa added a commit that referenced this issue Oct 18, 2020
  * a startup issue was resolved for SplunkForwarder installations with an improved
    systemd unit file (fix below)
  * Adds Inspec tests to verify from SplunkForwarder starts (thanks, @jjm)
- Fixes Issue [#187](#187)
  * the systemd unit file is now relegated to the `splunk enable boot-start` command to manage
- Adds Inspec tests and sets the verifier in Test Kitchen for some test suites; some are still in serverspec
- Render the user-seed.conf with a file resource rather than a template
- The default recipe no longer includes the disable recipe; to disable splunk, add `recipe[chef-splunk::disabled]` to a run list explicitly
- Disabling splunk will no longer uninstall Splunk Enterprise nor the Splunk Universal Forwarder
- Adds `#SecretsHelper` to aid with secrets rotation and maintaining idempotency for handling Splunk's hashed secret values
- Improved guards to prevent `service[splunk]` restart/start when it should be disabled.

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>
@ehvidal
Copy link

ehvidal commented Oct 19, 2020

Happening the same here.

In my case, kitchen converge completes without an error:

Recipe: chef-splunk::service
  * service[splunk] action restart
    - restart service service[splunk]

Running handlers:
Running handlers complete
Chef Infra Client finished, 22/44 resources updated in 36 seconds

but after that if I run kitchen verify:

  System Package splunkforwarder
     ✔  is expected to be installed
  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ×  is expected to be running
     expected that `Service splunk` is running

Test Summary: 3 successful, 1 failure, 0 skipped

The interesting thing is what ps -aux shows me:

root@default-ubuntu-1804:/# ps -aux | grep splunk
root         839  1.6  1.0 294276 80752 ?        Sl   20:28   0:00 splunkd -p 8089 restart
root         840  0.0  0.1  87852 13584 ?        Ss   20:28   0:00 [splunkd pid=839] splunkd -p 8089 restart [process-runner]
root         965  0.0  0.0  11460  1028 pts/0    S+   20:29   0:00 grep --color=auto splunk

it seems to me that the problem is in the restart of the service. If I kill those processes and I converge again, then everything is fine:

  System Package splunkforwarder
     ✔  is expected to be installed
  Service splunk
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ✔  is expected to be running

Test Summary: 4 successful, 0 failures, 0 skipped

haidangwa added a commit that referenced this issue Oct 20, 2020
* - Fixes Issue [#185](#185)
  * a startup issue was resolved for SplunkForwarder installations with an improved
    systemd unit file (fix below)
  * Adds Inspec tests to verify from SplunkForwarder starts (thanks, @jjm)
- Fixes Issue [#187](#187)
  * the systemd unit file is now relegated to the `splunk enable boot-start` command to manage
- Adds Inspec tests and sets the verifier in Test Kitchen for some test suites; some are still in serverspec
- Render the user-seed.conf with a file resource rather than a template
- The default recipe no longer includes the disable recipe; to disable splunk, add `recipe[chef-splunk::disabled]` to a run list explicitly
- Disabling splunk will no longer uninstall Splunk Enterprise nor the Splunk Universal Forwarder
- Adds `#SecretsHelper` to aid with secrets rotation and maintaining idempotency for handling Splunk's hashed secret values
- Improved guards to prevent `service[splunk]` restart/start when it should be disabled.

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* fix some chefspecs

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* updates test matrix in Github actions ci workflow
Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* fixes this error condition when executing the `#splunk_login_successful?` helper method:
```
    Errno::ENOENT
    -------------
    No such file or directory - /opt/splunk/bin/splunk
```

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* fixes a typo: the question is "should_not" not "shuold_not"

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* - uses `#splunk_secret_inspect` in the search head clustering server.conf.erb
- updates inspec
- disables Splunk's file locking verification on startup during Test Kitchen runs

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* fixes inspec tests for uninstall_forwarder and server-cluster-master suites

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>

* fixes inspec tests

Signed-off-by: Dang H. Nguyen <dang.nguyen@disney.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants