Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat service times-out at startup on Windows #3018

Closed
Cylindric opened this issue Nov 16, 2016 · 9 comments
Closed

Metricbeat service times-out at startup on Windows #3018

Cylindric opened this issue Nov 16, 2016 · 9 comments
Labels
bug help wanted Indicates that a maintainer wants help on an issue or pull request Metricbeat Metricbeat :Windows

Comments

@Cylindric
Copy link

I am running MetricBeat on about 30 Windows servers, and on one of them it's having trouble. Running it manually seems to work fine, although I get a whole bunch of "Skip process pid" messages due to missing pids, but I think that's fine.

If I start it as a service however, it appears to start correctly, start to read pids, and then just silently terminates. In the Service Manager, it seems to take ages to start, at which point the SM gives up with a timeout error.

This is the debug log output. The "error getting process details" ones repeat about 60 times.

2016-11-16T15:19:45Z INFO Home path: [C:\Beats\metricbeat] Config path: [C:\Beats\metricbeat] Data path: [C:\ProgramData\metricbeat] Logs path: [C:\Beats\metricbeat\logs]
2016-11-16T15:19:45Z INFO Setup Beat: metricbeat; Version: 6.0.0-alpha1
2016-11-16T15:19:45Z INFO Elasticsearch url: http://logstash:9200
2016-11-16T15:19:45Z INFO Elasticsearch url: http://elastic1:9200
2016-11-16T15:19:45Z INFO Activated elasticsearch as output plugin.
2016-11-16T15:19:45Z INFO Publisher name: ca-app-rc01
2016-11-16T15:19:45Z INFO Flush Interval set to: 1s
2016-11-16T15:19:45Z INFO Max Bulk Size set to: 50
2016-11-16T15:19:45Z INFO Register [ModuleFactory:[system], MetricSetFactory:[apache/status, beats/filebeat, beats/libbeat, docker/container, docker/cpu, docker/diskio, docker/info, docker/memory, docker/network, haproxy/info, haproxy/stat, mongodb/status, mysql/status, nginx/stubstatus, postgresql/activity, postgresql/bgwriter, postgresql/database, redis/info, redis/keyspace, system/core, system/cpu, system/diskio, system/filesystem, system/fsstat, system/memory, system/network, system/process, zookeeper/mntr]]
2016-11-16T15:19:45Z INFO Metricbeat process and system info: {"OSVersion":{"Major":6,"Minor":2,"Build":9200},"Arch":"amd64","NumCPU":4,"User":{"SID":"S-1-5-18","Account":"SYSTEM","Domain":"NT AUTHORITY","Type":1},"ProcessPrivs":{"SeAssignPrimaryTokenPrivilege":{"enabled":false},"SeAuditPrivilege":{"enabled_by_default":true,"enabled":true},"SeBackupPrivilege":{"enabled":false},"SeChangeNotifyPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateGlobalPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePagefilePrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePermanentPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateSymbolicLinkPrivilege":{"enabled_by_default":true,"enabled":true},"SeDebugPrivilege":{"enabled_by_default":true,"enabled":true},"SeImpersonatePrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseBasePriorityPrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseQuotaPrivilege":{"enabled":false},"SeIncreaseWorkingSetPrivilege":{"enabled_by_default":true,"enabled":true},"SeLoadDriverPrivilege":{"enabled":false},"SeLockMemoryPrivilege":{"enabled_by_default":true,"enabled":true},"SeManageVolumePrivilege":{"enabled":false},"SeProfileSingleProcessPrivilege":{"enabled_by_default":true,"enabled":true},"SeRestorePrivilege":{"enabled":false},"SeSecurityPrivilege":{"enabled":false},"SeShutdownPrivilege":{"enabled":false},"SeSystemEnvironmentPrivilege":{"enabled":false},"SeSystemProfilePrivilege":{"enabled_by_default":true,"enabled":true},"SeSystemtimePrivilege":{"enabled":false},"SeTakeOwnershipPrivilege":{"enabled":false},"SeTcbPrivilege":{"enabled_by_default":true,"enabled":true},"SeTimeZonePrivilege":{"enabled_by_default":true,"enabled":true},"SeUndockPrivilege":{"enabled":false}}}
2016-11-16T15:19:45Z INFO SeDebugPrivilege is enabled. SeDebugPrivilege=(Default, Enabled)
2016-11-16T15:19:45Z INFO Connected to Elasticsearch version 2.4.1
2016-11-16T15:19:45Z INFO Connected to Elasticsearch version 2.4.1
2016-11-16T15:19:57Z ERR Error getting process details pid=11368: error getting process arguments for pid=11368: ProcArgs failed for pid=11368: could not get Win32_Process WHERE ProcessId = 11368: Process not found
2016-11-16T15:19:57Z ERR Error getting process details pid=6748: error getting process arguments for pid=6748: ProcArgs failed for pid=6748: could not get Win32_Process WHERE ProcessId = 6748: Process not found
2016-11-16T15:19:58Z ERR Error getting process details pid=6616: error getting process arguments for pid=6616: ProcArgs failed for pid=6616: could not get Win32_Process WHERE ProcessId = 6616: Process not found
2016-11-16T15:19:58Z ERR Error getting process details pid=12664: error getting process arguments for pid=12664: ProcArgs failed for pid=12664: could not get Win32_Process WHERE ProcessId = 12664: Process not found
2016-11-16T15:19:58Z ERR Error getting process details pid=8072: error getting process arguments for pid=8072: ProcArgs failed for pid=8072: could not get Win32_Process WHERE ProcessId = 8072: Process not found


It gets to that first ERR line REALLY quickly, so it does look like it's not sending a service signal correctly for some reason. Interestingly, on the other ~20 servers I have, it's not a problem - just on two.
They are all Windows Server 2012 R2.

I originally brought this up on the forum, and @andrewkroh may have found a possible cause:

I'm pretty sure the problem is this line which requests all process stats in the module initialization phase. The New method should run quickly, but it's probably blocking other initialization (namely the goroutine that responds to the service manager signals).

@andrewkroh andrewkroh added help wanted Indicates that a maintainer wants help on an issue or pull request bug Metricbeat Metricbeat :Windows labels Nov 16, 2016
@andrewkroh andrewkroh changed the title MetricBeat on one of my Windows servers fails to start Metricbeat service times-out at startup on Windows Nov 16, 2016
@qhlHelo
Copy link

qhlHelo commented Nov 18, 2016

When i run the "Start-Service metricbeat" command in PowerShell,it was failed as bellow:

  • Start-Service metricbeat
  • - CategoryInfo          : OpenError: (System.ServiceProcess.ServiceController:ServiceController) [Start-Service], ServiceCommandException
    - FullyQualifiedErrorId : CouldNotStartService,Microsoft.PowerShell.Commands.StartServiceCommand
    
    

The OS's version is windows 7.

@martinscholz83
Copy link
Contributor

martinscholz83 commented Nov 21, 2016

@qhlHelo, try to run Powershell with administratives privileges.

@qhlHelo
Copy link

qhlHelo commented Nov 21, 2016

@maddin2016
Yes,i had run Powershell with administravives privileges, but the problem was still not solved.
I accidentally found the issue and download metricbeat-6.0.0-alpha1-SNAPSHOT-windows-x86_64.zip,now the problem is solved.
Thank you~

@martinscholz83
Copy link
Contributor

@qhlHelo, do you mean you had the same problem like @Cylindric? Or only to start the service with powershell.

@qhlHelo
Copy link

qhlHelo commented Nov 21, 2016

@maddin2016, just a moment ago,i had a test with the current version(5.0.1) again, this time i could start the service with powershell correctly. I think I have known why the problem occurred before.

At the time of the previous test, I opened the local services window with "services.msc" and I usually observed the status of all kinds of services, then I started the service with powershell while the service window remained opening. I think this is the cause.

Nothing but I had started the service, I met the err info in metricbeat's log as below:

2016-11-21T20:28:54+08:00 ERR Error getting process details pid=94136: error getting process arguments for pid=94136: ProcArgs failed for pid=94136: could not get Win32_Process WHERE ProcessId = 94136: Process not found

I will test it with metricbeat-6.0.0-alpha1-SNAPSHOT-windows-x86_64.zip

@martinscholz83
Copy link
Contributor

@andrewkroh, i have look trough the codebase and i think there is too much work until the service gives a signal that he is running. For example all that init stuff to run through all the modules i would put into the Run method and not into New. Can i open a PR to show you what i mean. I have tested it on my machine and it really speed up to start the process.

@trevorndodds
Copy link

Looks like the same issue I've been seeing from the start - #2817

@andrewkroh
Copy link
Member

@maddin2016

For example all that init stuff to run through all the modules i would put into the Run method and not into New.

I don't advise doing this because it will break the configuration validation (-configcheck).

Each module's factory function is supposed to run fast and only validate config and do minimal setup (no external connections, no fetching, etc.). The system-process MetricSet is violating this concept here. I suspect that if this is moved into the Fetch method and only run once on the first Fetch call that it will resolve the problem. If you'd like to test this and open a PR that would be appreciated. Thanks

@martinscholz83
Copy link
Contributor

Ok, sounds good. I will test and probably open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help wanted Indicates that a maintainer wants help on an issue or pull request Metricbeat Metricbeat :Windows
Projects
None yet
Development

No branches or pull requests

5 participants