Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat 6.0-alpha2 Vsphere module: not authenticated (auth not being reissued) #4673

Closed
alextxm opened this issue Jul 14, 2017 · 18 comments
Closed

Comments

@alextxm
Copy link

alextxm commented Jul 14, 2017

I've configured the Vsphere module in metricbeat 6.0.0-alpha2 (on a Windows 2008R2 machine) and let is run for a while. Please note than my VSphere configuration requires authentication (with insecure:true).
Metricbeat gathered data for about 12+ hours then Elasticseach started being filled with metric with the error.message field populated to NotAuthenticated (error.message:NotAuthenticated).
It happened when VSphere went "offline" for a scheduled backup activity; as it got back "online" the vsphere module seems not had performed auth again and as such ES started being populated with events with the NotAuthenticated error. I'm attaching a screenshot from kibana which details the described flow.
Please note also that no error messages can be found in the metricbeat log itself.
Is there a way to have the vsphere module to perform auth again ?
vsphere-error

@exekias
Copy link
Contributor

exekias commented Jul 17, 2017

thank you for reporting @alextxm, I have some questions:

  • Did you change username/password for your beats user?
  • Did you restart VSphere server at any moment?

I'll have a look to our client library (govmomi), perhaps the session expired and it doesn't authenticate on reconnect

@exekias exekias added bug Metricbeat Metricbeat labels Jul 17, 2017
@alextxm
Copy link
Author

alextxm commented Jul 20, 2017

Hi @exekias,
the username/password didn't change but the Vsphere server services had been restarted due to the nightly scheduled vshere DBbackup activity... this should be the cause of the session expiration.
Please note that this scenario could be pretty common IMHO since AFAIK vsphere backup requires stop/restart of the related services; as such, a session expiration/reconnect handling mechanism in the client lib (or the module itself) would be really useful.

@exekias
Copy link
Contributor

exekias commented Jul 21, 2017

So I think the client library is not issuing a new Login after reconnect, we will need to confirm that and fix it

@alextxm
Copy link
Author

alextxm commented Aug 7, 2017

Hi, is there any news on this ? Can i help with further testing ?

@amandahla
Copy link

amandahla commented Aug 15, 2017

@exekias

First I tried to use iptables to block the access and it worked fine. Reconnects after I deleted the rule.
Then I try to terminate the metricbeat session using VSphere Web Client. This way I got the same error "NotAuthenticated" all the time.
I'm not sure if this is the best way to fix and reconnect but at least in this test that I made, it worked.

Update: I was getting a nil pointer but just needed to recreate the view after reauthenticate

@exekias
Copy link
Contributor

exekias commented Aug 16, 2017

@amandahla from what I see we could benefit from using Session instead of Client? It would handle keepalive and some more goodies, what do you think? https://github.com/vmware/vic/blob/master/pkg/vsphere/session/session.go#L50-L87

@amandahla
Copy link

@exekias I'm just afraid if it will work on both versions (6.0/6.5) because I see here that uses finder to populate. Same thing that I had to change because of the 'datastore (or host) '*' not found' error. But I believe that it's worth to give it a try. I'll try to follow this to make changes and test. What do you think?

@exekias
Copy link
Contributor

exekias commented Aug 16, 2017

Sounds good to me, I guess we can wait for that before merging #4883?

@amandahla
Copy link

Yes, I think it would be better.

Please, can you help me with something? I made a test and now I get this:

./metricbeat flag redefined: version
panic: ./metricbeat flag redefined: version

goroutine 1 [running]:
flag.(*FlagSet).Var(0xc42001c120, 0x38f87a0, 0x396f426, 0x29f1b1c, 0x7, 0x2a026f9, 0x11)
	/home/user/Documentos/CDGIN/2017/ESTALEIRO/metricbeat/go-1.8.3/go/src/flag/flag.go:793 +0x420
flag.BoolVar(0x396f426, 0x29f1b1c, 0x7, 0x0, 0x2a026f9, 0x11)
	/home/user/Documentos/CDGIN/2017/ESTALEIRO/metricbeat/go-1.8.3/go/src/flag/flag.go:572 +0x72
github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/version.init.1()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/version/version.go:56 +0x5c
github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/version.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/version/version.go:146 +0x5d
github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/lib/config/executor.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/lib/config/executor/network_interface.go:85 +0x58
github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/lib/config.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/lib/config/virtual_container_host.go:351 +0x67
github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/vsphere/session.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/vendor/github.com/vmware/vic/pkg/vsphere/session/session.go:330 +0x8e
github.com/elastic/beats/metricbeat/module/vsphere/host.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/module/vsphere/host/host.go:128 +0x6c
github.com/elastic/beats/metricbeat/include.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/include/list.go:114 +0x1ec
github.com/elastic/beats/metricbeat/cmd.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/cmd/root.go:30 +0x71
main.init()
	/home/user/monvm1/src/github.com/elastic/beats/metricbeat/main.go:21 +0x49

I needed to import "github.com/vmware/vic/pkg/vsphere/session" and added to vsphere vendor. I'm not sure how to resolve this. :-(

@exekias
Copy link
Contributor

exekias commented Aug 16, 2017

You can patch https://github.com/vmware/vic/blob/master/pkg/version/version.go#L56 from the vendor folder temporarily and keep going, then we can treat that issue once/if it's working. Try by changing "version" to something else

@amandahla
Copy link

Thanks @exekias I tested with both versions and it was fine. Now, when the session is deactivacte, after the keepalive time, he re-authenticates.

For this commit, I changed 'version' to 'version1' in https://github.com/vmware/vic/blob/master/pkg/version/version.go#L56

Log:

WARN[0060] session keepalive error: ServerFaultCode: The session is not authenticated. 
INFO[0060] session keepalive re-authenticated

@alextxm
Copy link
Author

alextxm commented Oct 23, 2017

Hi @amandahla,
i'm testing the vsphere module in 6.0.0-rc1 since last week and the bug seems to be still present:
err-600rc1

@alextxm
Copy link
Author

alextxm commented Oct 26, 2017

Hi all,
i can confirm it still happens regurarly due to vsphere daily backup activities: vsphere closes connections, metricbeats then starts logging "unable to connect" errors and after a while swiches to "not authenticated" and gets stuck in such condition even if vsphere is availlable again. The only way to get metricbeat collect data again is to restart it.

@amandahla
Copy link

Hi @alextxm . The fix was not merged yet. Still working.
#4883

@exekias
Copy link
Contributor

exekias commented Oct 27, 2017

I gave a try to #4883 without success :(

I have another idea, what about initializing a new client on every fetch? That would be moving https://github.com/elastic/beats/blob/master/metricbeat/module/vsphere/virtualmachine/virtualmachine.go#L58 to the Fetch method. Any opinion on this @amandahla ?

@alextxm
Copy link
Author

alextxm commented Nov 8, 2017

Hi all,
i understand this is still a WIP but is there any change to get a fix in time for 6.0-GA ? It would be really nice !
Thank you

@amandahla
Copy link

amandahla commented Nov 8, 2017

@exekias I'll try that and see if logout like it's used here really works fine.

@amandahla
Copy link

amandahla commented Nov 8, 2017

Still using the PR #4883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants