Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGFS issue after updating kernel drivers #4362

Closed
dcramer opened this issue Aug 19, 2014 · 21 comments
Closed

HGFS issue after updating kernel drivers #4362

dcramer opened this issue Aug 19, 2014 · 21 comments

Comments

@dcramer
Copy link

dcramer commented Aug 19, 2014

I've raised this issue before, but I'm now confident that the problem happens consistently when upgrading kernel drivers.

How to reproduce:

  • Install the hashicorp/precise64
  • Install linux-signed-generic-lts-trusty
  • vagrant reload

VMWare's GUI doesnt exhibit the same issues as Vagrant, but I believe the key issue is that VMWare Tools need to be reconfigured. I dont know how the GUI gets around this, but it does.

@kikitux
Copy link
Contributor

kikitux commented Aug 19, 2014

you need to reinstall vmware tools

mount the iso
untar the vmware tools

then

perl <vmwaretools_folder/vmware_toolsfile.pl> -default

Alvaro.

On Tue, Aug 19, 2014 at 12:12 PM, David Cramer notifications@github.com
wrote:

I've raised this issue before, but I'm now confident that the problem
happens consistently when upgrading kernel drivers.

How to reproduce:

  • Install the hashicorp/precise64
  • Install linux-signed-generic-lts-trusty
  • vagrant reload

VMWare's GUI doesnt exhibit the same issues as Vagrant, but the key issue
is that VMWare Tools need to be reconfigured. I dont know how the GUI gets
around this, but it does.


Reply to this email directly or view it on GitHub
#4362.

@dcramer
Copy link
Author

dcramer commented Aug 19, 2014

FYI so far I've still yet to get this working with the upgraded kernel.

I stuffed this into /etc/rc.local (based off some random searches):

rkernel=`uname -r`

echo "running vmware-config-tools.pl"
/usr/bin/vmware-config-tools.pl -d
echo "vmware-tools now compiled for running kernel $rkernel"

echo "restarting vmware-tools"
/etc/init.d/vmware-tools restart
echo "vmware-tools restarted"

echo "restarting networking"
/etc/init.d/network restart
echo "network restarted"

@dcramer
Copy link
Author

dcramer commented Aug 19, 2014

FYI that script + upgrading the plugin and vagrant fixed my issues. 
I will try testing without the script as well and report back, but if it's required to reconfigure VMware tools yet it might make sense just to forcefully do that (or make it a simple flag) as upgrading kernels is far from an edge case. 

On Mon, Aug 18, 2014 at 5:33 PM -0700, "Alvaro Miranda" notifications@github.com wrote:

you need to reinstall vmware tools

mount the iso

untar the vmware tools

then

perl <vmwaretools_folder/vmware_toolsfile.pl> -default

Alvaro.

On Tue, Aug 19, 2014 at 12:12 PM, David Cramer notifications@github.com

wrote:

I've raised this issue before, but I'm now confident that the problem

happens consistently when upgrading kernel drivers.

How to reproduce:

  • Install the hashicorp/precise64
  • Install linux-signed-generic-lts-trusty
  • vagrant reload

VMWare's GUI doesnt exhibit the same issues as Vagrant, but the key issue

is that VMWare Tools need to be reconfigured. I dont know how the GUI gets

around this, but it does.

Reply to this email directly or view it on GitHub

#4362.


Reply to this email directly or view it on GitHub.

@mitchellh
Copy link
Contributor

This same issue exists in VirtualBox. As Alvaro said (and you verified), you have to reconfigure/reinstall the tools, since they work via kernel modules that depend on the version you have.

I'm not sure how in scope it is for Vagrant to handle this but Vagrant over time has been doing more and more for the user automatically so it probably isn't far off scope. Plus with the new capabilities systems in Vagrant 1.1+ we can probably do it without relative pain, although the maintenance overhead would be enormous most likely since we need to figure how to get the latest tools (VirtualBox/VMware don't have APIs for this) and automatically try to install them.

@dcramer
Copy link
Author

dcramer commented Aug 19, 2014

@mitchellh fwiw VMWare's tools are already present on the system in my case. All I had to do was call out to them. (I didn't need to mount the cdrom/etc)

I actually tried to resolve this w/ VirtualBox as well and it had seemingly more serious issues that I failed to grok.

@mitchellh
Copy link
Contributor

The more I think about it, the more I think this should be delegated to a plugin to detect. If a plugin can prove this feature out then I would merge it in but there are a lot of moving pieces to get there (detecting kernel versions, what version is supported by the installed tools/additions, downloading new versions, installing new versions, checking for errors in the install, getting the dependencies such as kernel headers), and I just wouldn't feel confident putting that sort of logic into Vagrant proper.

Historically we've just asked that as you upgrade kernels you include scripts that also verify and reinstall the tools. As more of our users have moved to Packer-based workflows this has actually become less of an issue, but that is really just masking the real issue rather than solving it.

I guess some questions related to VMware specifically to determine if this is the case:

  • Checking Kernel versions is easy enough (uname)
  • How do you verify what kernel the current VMware modules were made for? Do you just assume that failure means they aren't for this version?
  • Is that failure check safe across many versions? Or will that require constant maintenance?
  • How do you get the VMware tools if they aren't present? (I imagine they'd have to be mounted in). One issue here is that the VMware tools are actually downloaded on-demand and there is no way to trigger that demand from the CLI/API (Packer errors and asks the user to download them once). Annoying.

Given all the edge cases I'm not realistically seeing any meaningful progress being made on this.

Thoughts?

@dcramer
Copy link
Author

dcramer commented Aug 19, 2014

I at least agree that it should be the responsibility of the provider plugin. Now that I think about it, the reason the tools exist on my images are because they’re all based off of the Hashicorp provided boxes. If there’s no way to trigger a download, and there’s no guarantee that they’ll exist somewhere on the machine, but this isn’t a solvable problem.

Focusing on the VMWare case, when I went to mount the tools (which I expected to end up at /mnt/cdrom) nothing happened. I ended up running the commands provided from the help page (i.e. sudo mount /dev/sr0 /mnt/cdrom) and then was able to run everything headless. If that’s actually achievable for cases where the tools have already been downloaded it at least seems like a reasonable feature to adopt.

I sort of agree that this can be better solved in the Packer case, but the problem is a lot of provisioning happens outside of Packer, and with the end goal being to replicate what can happen in production (our case) we just want to apply our provisioners from a base machine.

On Monday, August 18, 2014 at 7:06 PM, Mitchell Hashimoto wrote:

The more I think about it, the more I think this should be delegated to a plugin to detect. If a plugin can prove this feature out then I would merge it in but there are a lot of moving pieces to get there (detecting kernel versions, what version is supported by the installed tools/additions, downloading new versions, installing new versions, checking for errors in the install, getting the dependencies such as kernel headers), and I just wouldn't feel confident putting that sort of logic into Vagrant proper.
Historically we've just asked that as you upgrade kernels you include scripts that also verify and reinstall the tools. As more of our users have moved to Packer-based workflows this has actually become less of an issue, but that is really just masking the real issue rather than solving it.
I guess some questions related to VMware specifically to determine if this is the case:
Checking Kernel versions is easy enough (uname)

How do you verify what kernel the current VMware modules were made for? Do you just assume that failure means they aren't for this version?

Is that failure check safe across many versions? Or will that require constant maintenance?

How do you get the VMware tools if they aren't present? (I imagine they'd have to be mounted in). One issue here is that the VMware tools are actually downloaded on-demand and there is no way to trigger that demand from the CLI/API (Packer errors and asks the user to download them once). Annoying.

Given all the edge cases I'm not realistically seeing any meaningful progress being made on this.
Thoughts?


Reply to this email directly or view it on GitHub (#4362 (comment)).

@tehranian
Copy link

Sorry to butt in. Want to add my 2 cents from my VMware experience.

The HGFS (Host-Guest File System) driver is for VMware's shared folder support. Vagrant hangs & times out waiting for the HGFS driver to load when booting w/an upgraded kernel because the new kernel does not have that HGFS kernel driver available (yet).

Two thoughts:

  • re: automatic rebuilds - VMware Tools has this feature already built in, or at least it used to. From vmware-config-tools.pl:

    VMware automatic kernel modules enables automatic building and installation of
    VMware kernel modules at boot that are not already present. This feature can be
    enabled/disabled by re-running vmware-config-tools.pl.
    
    Would you like to enable VMware automatic kernel modules?
    [no]
    

    I think we just have to figure out how to enable this feature which is disabled by default. I'm looking through the Perl code now to figure out what answering "yes" does under the covers...

  • If you're not using Vagrant w/shared folders at all or you've ditched VMware's shared folder implementation in favor of one of the alternative implementations (ex: NFS or rsync), then you could bypass the HGFS wait loop by disabling the default /vagrant shared folder. I just tried this and it does work. From my Vagrantfile:

    config.vm.synced_folder ".", "/vagrant", disabled: true
    

@tehranian
Copy link

I figured it out.

TL; DR

Set answer AUTO_KMODS_ENABLED yes in /etc/vmware-tools/locations. Missing VMware kernel modules will be re-built automatically upon boot.

Details
  • The answers from vmware-config-tools.pl are stored in: /etc/vmware-tools/locations as an appended text file "database" (their terminology).

  • Those settings are read out by /etc/vmware-tools/services.sh. This is the start/stop/status/restart script which controls vmware's services. From /etc/vmware-tools/services.sh we can see the code that looks to see if AUTO_KMODS are enabled in the aforementioned database file, and how in-turn those missing kernel modules are built:

    vmware_auto_kmods_enabled() {
       echo "$vmdb_answer_AUTO_KMODS_ENABLED"
    }
    
    vmware_auto_kmods() {
       # Check if mods are confed, but not installed.
       vmware_exec_selinux "$vmdb_answer_LIBDIR/sbin/vmware-modconfig-console \
                               --configured-mods-installed" && exit 0
    
       # Check that we have PBMs, of if not, then kernel headers and gcc.  Otherwise don't waste time
       if ! vmware_exec_selinux "$vmdb_answer_LIBDIR/sbin/vmware-modconfig-console --pbm-available vmmemctl"; then
           vmware_exec_selinux "$vmdb_answer_LIBDIR/sbin/vmware-modconfig-console \
                               --get-kernel-headers" || (echo "No kernel headers" && exit 1)
           vmware_exec_selinux "$vmdb_answer_LIBDIR/sbin/vmware-modconfig-console \
                               --get-gcc" || (echo "No gcc" && exit 1)
       fi
    
       # We assume config.pl has already been run since our init script is at this point.
       # If so, then lets build whatever mods are configured.
       vmware_exec_selinux "$vmdb_answer_BINDIR/vmware-config-tools.pl --default --modules-only --skip-stop-start"
    }
    
  • Updating the lines in /etc/vmware-tools/locations containing AUTO_KMODS_ENABLED from no -> yes, doing a sudo apt-get install linux-signed-generic-lts-trusty, and then a vagrant reload, I get a VM booted w/the new kernel without the HGFS driver timeout. Logging into the VM, I can see that the HGFS driver was automatically built and loaded. The proof from the pudding:

    vagrant@precise64:~$ grep AUTO /etc/vmware-tools/locations
    answer AUTO_KMODS_ENABLED_ANSWER yes
    answer AUTO_KMODS_ENABLED yes
    remove_answer AUTO_KMODS_ENABLED_ANSWER
    answer AUTO_KMODS_ENABLED_ANSWER yes
    remove_answer AUTO_KMODS_ENABLED
    answer AUTO_KMODS_ENABLED yes
    
    vagrant@precise64:~$ sudo apt-get install -y linux-signed-generic-lts-trusty
    ...
    
    vagrant@precise64:~$ exit
    
    $ time vagrant reload
    ==> default: Attempting graceful shutdown of VM...
    ==> default: Checking if box 'hashicorp/precise64' is up to date...
    ==> default: Verifying vmnet devices are healthy...
    ==> default: Preparing network adapters...
    ==> default: Fixed port collision for 22 => 2222. Now on port 2200.
    ==> default: Starting the VMware VM...
    ==> default: Waiting for machine to boot. This may take a few minutes...
        default: SSH address: 192.168.181.142:22
        default: SSH username: vagrant
        default: SSH auth method: private key
    ==> default: Machine booted and ready!
    ==> default: Forwarding ports...
        default: -- 22 => 2200
    ==> default: Configuring network adapters within the VM...
    ==> default: Waiting for HGFS kernel module to load...
    ==> default: Enabling and configuring shared folders...
        default: -- /Users/tehranian/Downloads/boxes/hashi-precise: /vagrant
    ==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
    ==> default: to force provisioning. Provisioners marked to run always will still run.
    
    real    1m5.324s
    user    0m9.940s
    sys 0m1.691s
    
    $ vagrant ssh -- uname -a
    Linux precise64 3.13.0-34-generic #60~precise1-Ubuntu SMP Wed Aug 13 15:55:33 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
    
    $ vagrant ssh -- lsmod | grep hgfs
    vmhgfs                 54572  2
    vmw_vmci               68525  2 vmhgfs,vmw_vsock_vmci_transport
    
    

So the trick here is to just update that /etc/vmware-tools/locations file to contain answer AUTO_KMODS_ENABLED yes.

@dcramer
Copy link
Author

dcramer commented Aug 19, 2014

I ended up doing the following, and it's working perfectly:

# Ensure that VMWare Tools recompiles kernel modules when we update the linux images
$fix_vmware_tools_script = <<SCRIPT
sed -i.bak 's/answer AUTO_KMODS_ENABLED_ANSWER no/answer AUTO_KMODS_ENABLED_ANSWER yes/g' /etc/vmware-tools/locations
sed -i.bak 's/answer AUTO_KMODS_ENABLED no/answer AUTO_KMODS_ENABLED yes/g' /etc/vmware-tools/locations
SCRIPT

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # ...
  config.vm.provision :shell, :inline => $fix_vmware_tools_script
end

@mitchellh
Copy link
Contributor

This is fantastic. Great job. I think in the short term we can fix this by adding a "guides" or FAQ section to the docs and adding this there.

@tehranian
Copy link

@dcramer Cool!
@mitchellh Thanks!

FWIW - I managed to optimize the solution a little bit:

echo "answer AUTO_KMODS_ENABLED yes" | sudo tee -a /etc/vmware-tools/locations

This works because the key/values appended at the end of the locations file take precedence over earlier definitions. OFC this means that if someone comes back and runs vmware-config-tools.pl -d then the default setting of disabling automatic kernel modules will be put back in, but that's true for either solution that depends on this "automatic kernel modules feature".

I wrote this 'ish up at: http://dantehranian.wordpress.com/2014/08/19/vagrant-vmware-resolving-waiting-for-hgfs-kernel-module-timeouts/

@kikitux
Copy link
Contributor

kikitux commented Aug 20, 2014

you should be abkle to something like

service vmware-tools status
if [ $? -ne 0 ]; then

rkernel=uname -r

echo "running vmware-config-tools.pl"
/usr/bin/vmware-config-tools.pl -d
echo "vmware-tools now compiled for running kernel $rkernel"

echo "restarting vmware-tools"
/etc/init.d/vmware-tools restart
echo "vmware-tools restarted"

echo "restarting networking"
/etc/init.d/network restart
echo "network restarted"

fi

to avoid doing on every single start..

On Tue, Aug 19, 2014 at 12:41 PM, David Cramer notifications@github.com
wrote:

FYI so far I've still yet to get this worked with the upgraded kernel.

I stuffed this into /etc/rc.local (based off some random searches):

rkernel=uname -r

echo "running vmware-config-tools.pl"
/usr/bin/vmware-config-tools.pl -d
echo "vmware-tools now compiled for running kernel $rkernel"

echo "restarting vmware-tools"
/etc/init.d/vmware-tools restart
echo "vmware-tools restarted"

echo "restarting networking"
/etc/init.d/network restart
echo "network restarted"


Reply to this email directly or view it on GitHub
#4362 (comment).

@mitchellh mitchellh added the docs label Aug 29, 2014
@StannisSmith
Copy link

As a Vagrant newbie, I spent hours trying every suggestion I could find on the net to get this working. Just running a yum update on a fresh precise64 box would cause the problem on the next VM start. I was ready to give up when I reread this page a bit more carefully. What did work was following what dcramer spelled out for the Vagrantfile changes, and installing Perl on the VM. Just appending a line to /etc/vmware-tools/locations did not work in my attempts... perhaps I missed something. It may have taken an additional halt/up sequence, but now the VM is happy and so am I.

Hope my elatation holds as I start to play around. That was a bit of a tough go for a first timer messing with a Vagrantfile! I would very much encourage this to be documented on the site. My Vagrant & VMware provider were brand spanking new, so it was pretty damn frustrating.

@kikitux
Copy link
Contributor

kikitux commented Sep 21, 2014

hello,

you are getting into a difficult path if you start trying to many things on
the same shot.

the expectation is you get a base box that is usable, if you start doing
update/upgrades then would be your task to make sure to make the box fit
for the use.

that is after next boot update/compile vmfs drivers or virtualbox drivers

once you get this new working box, after this couple fo reboots and
updates, you may want to export this box as a new base box updated.

Try to avoid multitask parallel things, since that will make harder
troubleshoot if something fail.

those would be my advice to keep your vagrant experience simple as possible.

alvaro.

On Sat, Sep 20, 2014 at 5:53 AM, StannisSmith notifications@github.com
wrote:

As a Vagrant newbie, I spent hours trying every suggestion I could find on
the net to get this working. Just running a yum update on a fresh precise64
box would cause the problem on the next VM start. I was ready to give up
when I reread this page a bit more carefully. What did work was following
what dcramer spelled out for the Vagrantfile changes, and installing Perl
on the VM. Just appending a line to /etc/vmware-tools/locations did not
work in my attempts... perhaps I missed something. It may have taken an
additional halt/up sequence, but now the VM is happy and so am I.

Hope my elatation holds as I start to play around. That was a bit of a
tough go for a first timer messing with a Vagrantfile! I would very much
encourage this to be documented on the site. My Vagrant & VMware provider
were brand spanking new, so it was pretty damn frustrating.


Reply to this email directly or view it on GitHub
#4362 (comment).

@StannisSmith
Copy link

My comments were to encourage Vagrant management to document the "best" fix to this problem, and to point out that what dcramer suggested worked for me (while the locations file append did not.) Alvaro, if the fix you suggested is better (I confess I haven't tried that one) then it would be terrific if that would be clearly demonstrated. In my many attempts at a fix the problem, the VM start would often loop until failure at the SSH step, the boot process would appear hung, or when complete "vagrant ssh" would simply return nothing.

Just running a yum update on a fresh box clearly isn't trying too many things at once, it is one of the simplest things to start off with. In fact the problem appears to be something everyone using the VMware provider will encounter right off the bat. I had no idea that one might need to "update/compile vmfs drivers" after VM kernel changes, nor did I know off the top of my head how to accomplish that. I'm guessing that many others Vagrant/VMware users will not either. Sadly, it even took me a while to figure out that I could manually hunt down and open the VM in workstation and directly watch for problems during the boot (D'oh!)

Most users on the level of trying Vagrant with the VMware provider will be able to plod through it like I did... if they persevere through the frustrations. However, that could all be avoided if this were spelled out clearly in the Vagrant Dox (or is it and I've just missed it?)

@therootcause
Copy link

This might help, might not.
I was faced with 'vagrant reload' not working on my centos7 packer built box, after a kernel upgrade had been performed.

When using vmware tools and auto-kmods - At the moment the new kernel gets installed, the system is going to need to have a set of development tools installed. These tools might have been installed, but then cleaned up, in the provisioning process.

In my centos7/packer code, this amounted to removing a couple lines of 'cleanup' in the vmtools.sh and cleanup.sh scripts.
instead of removing these pacakges, I'm asserting that packages are installed:
yum -y install gcc cpp kernel-devel kernel-headers perl

My 'vagrant reload' behavior is now what I've come to expect, [and love].

cheers

(I should note, I worked around this previously in Centos6.x by using the traditional esx packaged vmware tools offering. I can go back to open-vm-tools everywhere.)

@mitchellh
Copy link
Contributor

I've updated the website with a full page on how to handle kernel upgrades for VMware as well as @dcramer's helpful script. Please let me know any information I can add to this to improve it. It will be deployed with the next version of Vagrant.

@dragon788
Copy link
Contributor

Just a note for anyone who comes back here with concerns that it no longer works under the latest kernel 3.13.0-46-generic on Ubuntu (and potentially other distros). This fix still works, the issue is that VMware Tools itself hasn't caught up to some code changes in the kernel. According to https://communities.vmware.com/message/2477575#2477575 it will be fixed soon for Workstation 11 and Fusion 7, and potentially backported to Workstation 10 and Fusion 6.

This thread has some potential workarounds, the best of which is downgrading your kernel to -45 or earlier and rebuilding VMware Tools.
http://askubuntu.com/questions/586221/vmhgfs-module-not-compilable-for-vmware-tools-9-9-0-fusion7-1-after-ubuntu-lin

If anybody is using dist-upgrade in their Vagrant or Packer scripts, this will probably bite them shortly. Here's an excellent article on why system administrators don't dist-upgrade automatically. http://askubuntu.com/a/226213

davidalger pushed a commit to davidalger/devenv that referenced this issue Aug 18, 2015
@fireproofsocks
Copy link

There has to be a better way to debug/troubleshoot this issue. It fundamentally destroys the usefulness of using Vagrant in the first place and at present, it takes massive amounts of time to fix. I would fully support any work to help make the functionality here and any glitches more transparent.

@rehmanzile
Copy link

The workaround for Kernel auto upgrade does not work for Ubuntu 14.04.4 version. Had to go back to 14.04.3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants