Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set DRb conn pool to [] after closing connections #17267

Merged
merged 1 commit into from
Apr 9, 2018

Conversation

agrare
Copy link
Member

@agrare agrare commented Apr 9, 2018

After forking and closing open DRb pooled connections set the pool to []
to prevent other calls calling conn.close on already closed
connection. This prevents the following:

[NoMethodError]: undefined method `close' for nil:NilClass
Did you mean?  clone
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1258:in `close'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1237:in `block in open'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:234:in `block in sync_synchronize'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:231:in `handle_interrupt'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:231:in `sync_synchronize'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1235:in `open'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1141:in `block in method_missing'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1160:in `with_friend'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1140:in `method_missing'
lib/gems/pending/VMwareWebService/MiqVimBroker.rb:419:in `getMiqVim'

After a successful call DRbConn.open pops a connection off the pool and closes it, https://github.com/ruby/ruby/blob/v2_3_7/lib/drb/drb.rb#L1237
And #close doesn't check if @protocol is nil before calling @protocol.close, https://github.com/ruby/ruby/blob/v2_3_7/lib/drb/drb.rb#L1258

https://bugzilla.redhat.com/show_bug.cgi?id=1562401
https://bugzilla.redhat.com/show_bug.cgi?id=1566198

After forking and closing open DRb pooled connections set the pool to []
to prevent other calls calling `conn.close` on already closed
connection.  This prevents the following:

```
[NoMethodError]: undefined method `close' for nil:NilClass
Did you mean?  clone
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1258:in `close'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1237:in `block in open'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:234:in `block in sync_synchronize'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:231:in `handle_interrupt'
/opt/rh/rh-ruby23/root/usr/share/ruby/sync.rb:231:in `sync_synchronize'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1235:in `open'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1141:in `block in method_missing'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1160:in `with_friend'
/opt/rh/rh-ruby23/root/usr/share/ruby/drb/drb.rb:1140:in `method_missing'
lib/gems/pending/VMwareWebService/MiqVimBroker.rb:419:in `getMiqVim'
```

https://bugzilla.redhat.com/show_bug.cgi?id=1562401
@miq-bot
Copy link
Member

miq-bot commented Apr 9, 2018

Checked commit agrare@76309b9 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 👍

@agrare
Copy link
Member Author

agrare commented Apr 9, 2018

/cc @jrafanie

@jrafanie
Copy link
Member

jrafanie commented Apr 9, 2018

For future reference, this was introduced in this PR: #16953 It was backported to gaprindashvili, fine, euwe, and has the darga/yes flag

@jrafanie jrafanie merged commit 142129e into ManageIQ:master Apr 9, 2018
@jrafanie
Copy link
Member

jrafanie commented Apr 9, 2018

And #close doesn't check if @protocol is nil before calling @protocol.close

😅 🤦‍♂️

@jrafanie jrafanie added this to the Sprint 83 Ending Apr 9, 2018 milestone Apr 9, 2018
@agrare agrare deleted the bz_1562401_empty_pool_after_fork branch April 10, 2018 12:21
simaishi pushed a commit that referenced this pull request Apr 11, 2018
Set DRb conn pool to [] after closing connections
(cherry picked from commit 142129e)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1566256
simaishi pushed a commit that referenced this pull request Apr 12, 2018
Set DRb conn pool to [] after closing connections
(cherry picked from commit 142129e)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1566255
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit 4be11da658819f2811a5217b8db0c61b51065aa4
Author: Joe Rafaniello <jrafanie@users.noreply.github.com>
Date:   Mon Apr 9 16:36:15 2018 -0400

    Merge pull request #17267 from agrare/bz_1562401_empty_pool_after_fork
    
    Set DRb conn pool to [] after closing connections
    (cherry picked from commit 142129eddf87212b71b60222d3dc41b2ed2e6e65)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1566255

@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit edcb7c5d3a0527d59d4b763bd6321a069e1c4360
Author: Joe Rafaniello <jrafanie@users.noreply.github.com>
Date:   Mon Apr 9 16:36:15 2018 -0400

    Merge pull request #17267 from agrare/bz_1562401_empty_pool_after_fork
    
    Set DRb conn pool to [] after closing connections
    (cherry picked from commit 142129eddf87212b71b60222d3dc41b2ed2e6e65)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1566256

d-m-u pushed a commit to d-m-u/manageiq that referenced this pull request Jun 6, 2018
…fter_fork

Set DRb conn pool to [] after closing connections
(cherry picked from commit 142129e)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1566256
@dfmanguia
Copy link

Hello, i have this issue. i added the line to miq_worker.rb. ive to do something additional, because the problem still present when i try to do an smart state analysis.

@agrare
Copy link
Member Author

agrare commented Jun 21, 2018

Hey @dfmanguia what version are you using?

@dfmanguia
Copy link

Hello again, im using Version 5.9.2.4.20180501195858_35dc609 note that im using CLOUDFROMS FROM redhat. what do you think about it?

@jrafanie
Copy link
Member

@dfmanguia From a quick glance, that version should have that fix already so you shouldn't have needed to do that. Are you sure it's that version and the line didn't already exist there? It's probably a different source of the error. Can you contact Red Hat support with the logs showing the backtrace for the error?

@dfmanguia
Copy link

Well, actually i updated cf and the version is 5.9.2.4, yes the code is by default there. yesterday i was outdate so thats why the line wasnt there. But the problem still there.

@jrafanie
Copy link
Member

@dfmanguia did you stop all processes before updating? We fork processes so if the parent evm_server process isn't restarted, any new workers that are restarted will not pick up any code changes provided by an update.

@agrare
Copy link
Member Author

agrare commented Jun 21, 2018

@dfmanguia is it the same backtraces or just the same exception?
👍 to opening a support case and uploading logs.

@agrare
Copy link
Member Author

agrare commented Jun 21, 2018

If it is repeatedly happening on vms and not hosts I'd bet it is not the same issue as this. This was extremely rare and hard to reproduce on a live appliance.

@jrafanie
Copy link
Member

@dfmanguia Yeah, I suggest you open a support ticket and possibly remove the screenshot containing hostnames, etc. I hid the comment but it's still there. There's no private comment in public github.

@dfmanguia
Copy link

is any process to add scvmm as provider to cloudforms, maybe im missing something?, i can see the hosts and the vms, but i want scanstate this vm and got this problem?
ive antoher providers rhev and vmware, everything work except for this provider scvmm

@dfmanguia
Copy link

Im working in this, and in the logs appear this.
[----] I, [2018-06-22T09:12:09.842417 #21991:b2b114] INFO -- : Q-task_id([job_dispatcher]) MIQ(ManageIQ::Providers::Microsoft::InfraManager::Vm.connect_to_ems): Connecting to [host(directly):] for VM:[vmname]
[----] E, [2018-06-22T09:12:09.843098 #21991:b2b114] ERROR -- : Q-task_id([job_dispatcher]) MIQ(ManageIQ::Providers::Microsoft::InfraManager::Vm.connect_to_ems): Connection to [host(directly):] failed for VM:[vmname] with error [no Hostname defined] after [0.000738802] seconds
[----] E, [2018-06-22T09:12:09.845441 #21991:b2b114] ERROR -- : Q-task_id([job_dispatcher]) MIQ(ManageIQ::Providers::Microsoft::InfraManager::Vm#perform_metadata_scan): undefined method close' for nil:NilClass Did you mean? clone [----] E, [2018-06-22T09:12:09.845925 #21991:b2b114] ERROR -- : Q-task_id([job_dispatcher]) MIQ(MiqServer#scan_metadata) undefined method close' for nil:NilClass
Did you mean? clone

seem like no Hostname defined] where ive to configure this?

@agrare
Copy link
Member Author

agrare commented Jun 26, 2018

@dfmanguia this PR is not the place to discuss as this is not related to your issue and is cluttering the history, please open a support case with red hat support or at least open a separate GitHub issue for the problem you're having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants