Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error setting hostname when setting up a centos-7.2 box #7533

Closed
peetkes opened this issue Jun 29, 2016 · 8 comments · Fixed by #7605
Closed

Error setting hostname when setting up a centos-7.2 box #7533

peetkes opened this issue Jun 29, 2016 · 8 comments · Fixed by #7605

Comments

@peetkes
Copy link

peetkes commented Jun 29, 2016

Vagrant version

vagrant version 1.8.4

Host operating system

MacOS 10.11.4

Guest operating system

CentOS 7.2

Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

# Vagrant by default saves VM info in a .vagrant/ folder next to where Vagrantfile lives.
# You can however start vagrant from a subfolder, in which case VM info from different VMs
# could get mixed up. Make sure to put .vagrant/ in CWD rather than next to Vagrantfile
# if there is no Vagrantfile in CWD.
VAGRANTFILE = ENV["VAGRANT_VAGRANTFILE"] || "Vagrantfile"
VAGRANTFILE_PATH = Dir.getwd + "/" + VAGRANTFILE

@command = ARGV[0]

if @command != "global-status" && # skip this for global-status
  !(File.exists? VAGRANTFILE_PATH) && # Vagrantfile must be in a higher dir
    ENV["VAGRANT_DOTFILE_PATH"].nil? # and .vagrant location is not set explicitly

  VAGRANT_DOTFILE_PATH = Dir.getwd + "/.vagrant"
  puts "Setting VAGRANT_DOTFILE_PATH to " + VAGRANT_DOTFILE_PATH
  puts ""

  ENV["VAGRANT_DOTFILE_PATH"] = VAGRANT_DOTFILE_PATH
  system "vagrant " + ARGV.join(" ")
  ENV["VAGRANT_DOTFILE_PATH"] = nil # for good measure

  abort "Finished"
end

def load_properties(properties_filename, prefix = "")
  properties = {}

  if File.exists? properties_filename
    File.open(properties_filename, "r") do |properties_file|
      properties_file.read.each_line do |line|
        line.strip!
        if (line[0] != ?#) && (line[0] != ?=) && (line[0] != "")
          i = line.index("=")
          if i
            key = prefix + line[0..i - 1].strip.upcase
            value = line[i + 1..-1].strip
            value.gsub!(/^"(.*)"$/, '\1')
            properties[key] = value
          end
        end
      end
    end
  else
    puts "WARN: Properties file #{properties_filename} not found.." unless @command == "global-status"
  end

  properties
end

def get_vm_name(i)
  name = "#{@vm_name}"
  name.gsub!(/\{project_name\}/, @project_name)
  name.gsub!(/\{vm_version\}/, @vm_version)
  name.gsub!(/\{ml_version\}/, @ml_version)
  name.gsub!(/\{i\}/, i.to_s)
  name
end

def inc_ip(ip, i)
  newip = "#{ip}"
  nr = "#{ip}"
  nr.gsub!(/^(.*\.)+/, "").to_i
  newip.gsub!(/\.\d+$/, ".#{nr + i}")
  newip
end

@properties = load_properties("project.properties")
@project_name = ENV["MLV_PROJECT_NAME"] || @properties["PROJECT_NAME"] || File.basename(Dir.getwd)

@vm_name = ENV["MLV_VM_NAME"] || @properties["VM_NAME"] || "{project_name}-ml{i}"
@vm_version = ENV["MLV_VM_VERSION"] || @properties["VM_VERSION"] || "6.7"
@ml_version = ENV["MLV_ML_VERSION"] || @properties["ML_VERSION"] || "8"
@nr_hosts = (ENV["MLV_NR_HOSTS"] || @properties["NR_HOSTS"] || "3").to_i
@master_memory = (ENV["MLV_MASTER_MEMORY"] || @properties["MASTER_MEMORY"] || "2048").to_i
@master_cpus = (ENV["MLV_MASTER_CPUS"] || @properties["MASTER_CPUS"] || "2").to_i
@slave_memory = (ENV["MLV_SLAVE_MEMORY"] || @properties["SLAVE_MEMORY"] || @master_memory.to_s).to_i
@slave_cpus = (ENV["MLV_SLAVE_CPUS"] || @properties["SLAVE_CPUS"] || @master_cpus.to_s).to_i
@ml_installer = ENV["MLV_ML_INSTALLER"] || @properties["ML_INSTALLER"] || ""
@mlcp_installer = ENV["MLV_MLCP_INSTALLER"] || @properties["MLCP_INSTALLER"] || ""
@public_network = ENV["MLV_PUBLIC_NETWORK"] || @properties["PUBLIC_NETWORK"] || ""
@priv_net_ip = ENV["MLV_PRIV_NET_IP"] || @properties["PRIV_NET_IP"] || ""
@shared_folder_host = ENV["MLV_SHARED_FOLDER_HOST"] || @properties["SHARED_FOLDER_HOST"] || ""
@shared_folder_guest = ENV["MLV_SHARED_FOLDER_GUEST"] || @properties["SHARED_FOLDER_GUEST"] || ""
@net_proxy = ENV["MLV_NET_PROXY"] || @properties["NET_PROXY"] || ""
@no_proxy = ENV["MLV_NO_PROXY"] || @properties["NO_PROXY"] || "localhost,127.0.0.1"

unless @net_proxy.empty? or Vagrant.has_plugin?("vagrant-proxyconf")
  abort 'To use net_proxy setting, run "vagrant plugin install vagrant-proxyconf" first.'
end

@vm_name = get_vm_name("{i}")

puts "Loading project #{@project_name}.." unless @command == "global-status"
if @command == "status" or @command == "up" or @command == "provision"
  puts ""
  puts "vm_name=#{@vm_name}"
  puts "vm_version=#{@vm_version}"
  puts "ml_version=#{@ml_version}"
  puts "nr_hosts=#{@nr_hosts}"
  puts "master_memory=#{@master_memory}"
  puts "master_cpus=#{@master_cpus}"
  if @nr_hosts > 1
    puts "slave_memory=#{@slave_memory}"
    puts "slave_cpus=#{@slave_cpus}"
  end
  if @ml_installer != ""
    puts "ml_installer=#{@ml_installer}"
  end
  if @mlcp_installer != ""
    puts "mlcp_installer=#{@mlcp_installer}"
  end
  if @shared_folder_host != "" and  @shared_folder_guest != ""
    puts "shared_folder_host=#{@shared_folder_host}"
    puts "shared_folder_guest=#{@shared_folder_guest}"
  end
  if @public_network != ""
    puts ""
    puts "WARN: Using DHCP on Public Network '#{@public_network}'!"
  elsif @priv_net_ip != ""
    puts "priv_net_ip=#{@priv_net_ip}"
  else
    puts ""
    puts "Using DHCP for Private Network"
  end
  puts ""

  if @command == "up" and @vm_version == "5.11" and @ml_version == "8"
    puts "MarkLogic 8 NOT supported on CentOS 5! Try MarkLogic 7, or CentOS 6.."
    abort "Bailing out.."
  end
end

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  mastername = get_vm_name(1)

  unless @net_proxy.empty?
    hostnames = (1..@nr_hosts).map{|i| get_vm_name(i)}.join(",")
    config.proxy.ftp = @net_proxy
    config.proxy.http = @net_proxy
    config.proxy.https = @net_proxy
    config.proxy.no_proxy = "#{@no_proxy},#{hostnames}"
  end

  config.hostmanager.enabled = false
  config.hostmanager.manage_host = true
  config.hostmanager.include_offline = true
  config.hostmanager.ignore_private_ip = false
  config.hostmanager.ip_resolver = proc do |machine|
      result = ""
      begin
        machine.communicate.execute("/sbin/ifconfig eth1") do |type, data|
          result << data if type == :stdout
        end
      rescue
        result = "# NOT-UP"
        puts "Getting IP from #{ machine.name } ... not running"
        next
      end
      ip = /^\s*inet .*?(\d+\.\d+\.\d+\.\d+)\s+/.match(result)[1]
      puts "Getting IP from #{ machine.name } ... #{ip}"
      ip
  end

  # Customize the virtual machine environments
  config.vm.provider :virtualbox do |vb|
      vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
      vb.customize ["modifyvm", :id, "--nictype2", "virtio"]
      #vb.gui = true # for debugging
  end


  config.vm.define mastername do |master|
      master.vm.box = "bento/centos-7.2"   #{}"grtjn/centos-#{@vm_version}"
      master.vm.provider "virtualbox" do |v|
          v.name = mastername
          v.memory = @master_memory
          v.cpus = @master_cpus
      end
      master.vm.hostname = mastername
      if @public_network != ""
        master.vm.network "public_network", bridge: @public_network
      elsif @priv_net_ip != ""
        master.vm.network "private_network", ip: @priv_net_ip
      else
        master.vm.network "private_network", type: "dhcp"
      end
      master.vm.synced_folder Dir.getwd, "/vagrant"
      master.vm.synced_folder "/opt/vagrant", "/opt/vagrant"
      master.vm.synced_folder "/space/software", "/space/software"
      if @shared_folder_host != "" and  @shared_folder_guest != ""
        master.vm.synced_folder @shared_folder_host, @shared_folder_guest, :create => true
      end
      master.vm.provision :hostmanager
  end
end

Use the following properties file:

##
# Project name - defaults to current directory name
#
#project_name=example

##
# VM naming pattern - defaults to {project_name}-ml{i}, also allowed: {ml_version}
#
# IMPORTANT: DON'T CHANGE ONCE YOU HAVE CREATED THE VM'S!!
#
vm_name={project_name}-ml{i}

##
# CentOS base VM version - defaults to 6.7, allowed: 5.11/6.5/6.6/6.7/7.0/7.1
#
vm_version=7.2

##
# Number of hosts in the cluster - defaults to 3, minimum for failover support
#
nr_hosts=1

##
# Memory assigned to master node in cluster (first vm) - defaults to 2048
#
master_memory=4096

##
# Number of cpus assigned to master node in cluster (first vm) - defaults to 2
#
master_cpus=2

##
# Memory assigned to each slave node in cluster - defaults to same as master_memory
#
#slave_memory=2048

##
# Number of cpus assigned to each slave node in cluster - defaults to same as master_cpus
#
#slave_cpus=2

##
# Name of public_network to use in Vagrant - defaults to ""
#
# Note: enabling this makes your VMs accessible from outside, beware of security leaks
#
#public_network="en0: Wi-Fi (AirPort)"

##
# Assign dedicated private IP to master node - slaves get same ip + i
#
#priv_net_ip=

##
# Network proxy - requires vagrant-proxyconf plugin
#
#net_proxy=http://proxy:8080/

##
# Hosts that do not require network proxy
#
#no_proxy=localhost,127.0.0.1

##
# Mount an extra folder from host on vm - project dir is automatically shared as /vagrant
#
#shared_folder_host=
#shared_folder_guest=

##
# Run full OS updates - defaults to false
#
# Note: doing this with CentOS 6.5 or 7.0 will take it up to the very latest minor release (6.7+ resp 7.1+)
#
update_os=false

Debug output

https://gist.github.com/peetkes/f4896b524c0baaea5b9f9ded43d41854

Expected behavior

Normal startup of a vagrant box

Actual behavior

the box is started but with the error

Steps to reproduce

  1. copy the vagrant file and the properties file in a folder
  2. run following command
  3. vagrant up --no-provision

References

@DanHam
Copy link

DanHam commented Jul 11, 2016

Hi,

I have been seeing the same issue with my own Centos 7.2 box (built with Packer). If I use the VMware files created by Packer directly (rather than packaging them into a box and running with Vagrant) then the VM starts fine suggesting that the issue is with Vagrant rather than anything else.

Vagrant Version

Installed Version: 1.8.4

Vagrant plugins

vagrant-aws (0.7.2)
vagrant-share (1.1.5, system)
vagrant-vmware-fusion (4.0.10)

VMware Fusion

Professional Version 8.1.1 (3771013)

Apologies if you are already aware of all this, but...

Having taken a closer look into the issue, the root cause (at least at the level of the VM) seems to be that there is a network script configured for the ens33 interface (/etc/sysconfig/network-scripts/ifcfg-ens33) while the actual interface in the Vagrant VM is ens32 (/sys/class/net/ens32). The consequence is that the network.service reports errors due to the 'missing' ens33 interface.

When Vagrant does it's stuff to set the host name it restarts the network.service as part of the process - it's at this point that we see the error.

I used a bit of a dirty hack using guestfish to rename the /etc/sysconfig/network-scripts/ifcfg-ens33 file to /etc/sysconfig/network-scripts/ifcfg-ens32 and edit the contents accordingly - substituting any reference to ens33 in the file to ens32. I then packaged the altered vmdk file into a Vagrant box and was able to start the VM using 'vagrant up' without error.

Hope that helps some with figuring out what is going on (and where) within Vagrant to cause this issue!

@VinceMacBuche
Copy link

Hello there,

I encountered the same issue using box geerlingguy/centos7 (centos 7.2), Details below:

Vagrant version

Vagrant 1.8.4

Vagrant plugins

vagrant-cachier (1.2.1)
vagrant-libvirt (0.0.33)
vagrant-mutate (1.1.0)
vagrant-share (1.1.5, system)

Host operating system

Debian 8

Guest operating system

Centos 7.2

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "geerlingguy/centos7"
  config.vm.hostname = "server"
end

Debug output

vagrant up --debug:
https://gist.github.com/VinceMacBuche/b08afdd7c845571e6a4c6d9050fc1885

Some more logs if i run the command on the guest after vagrant ssh (which works):
https://gist.github.com/VinceMacBuche/4a5ba924949170c34858641ab08a8a9a

@VinceMacBuche
Copy link

More informations, I tested with older versions of Vagrant:

1.8.3, broken too
1.8.1 => It works!

From @DanHam remark, I also checked if networks were correctly defined, and it seems that a file created by vagrant-libvirt is not correct. It created a network interface eth0, but created a script etc/sysconfig/network-scripts/ifcfg-enp0s3 (and there is no interface enp0s3). Renaming the interface in the script to eth0, repaired, but i really lack some skills here

@VinceMacBuche
Copy link

The "service network restart" was introduced here: b91c167 but I can't say if something smart can be done to repair broken boxes ( file ifcfg-enp0s3, should'nt be here ... )

@DanHam
Copy link

DanHam commented Jul 18, 2016

@VinceMacBuche

Hi Vince,

So the point I was trying to make (badly!!) above was that something strange was going on to cause a change in the name of the network interface between the time Packer/VMware built the box and Vagrant/VMware ran it.

To the best of my knowledge Vagrant doesn't actually do very much at all with regard to setting up the default network interface. Instead it just expects, and in fact requires, the network interface to be correctly configured. In other words the configuration of the interface is done when the box is built, not when it is run with Vagrant.

It is the guest itself that determines the name for the network interface - not Vagrant. It used to be the case that network device names were arbitrarily assigned on first boot and then persisted through the use of a udev script. More recently this practice has changed and network interface names are now determined through the use of a tool called biosdevname. Essentially biosdevname uses the location of the network device (as reported by the system BIOS) to set the device name. So, for example, a dual port network card installed in PCI slot 1 on the motherboard will result in the two interfaces named p1p1 and p1p2 respectively - see the link to the Red Hat doc for (better!) details.

Clearly, virtual machines don't have physical slots or physical cards and instead everything is emulated. The 'physical' location of devices on the virtual motherboard are then specified within a configuration file. For Virtualbox this is an XML file with the .vbox extension; VMware stores it's settings in a .vmx file. When a Vagrant box is built this file is bundled into the .box file along with the VM's hard disk file and a few others. This file should, and at least for me does include the physical location of the the network interface:

...
ethernet0.pcislotnumber = "33"
...

Essentially my box was built with its network card located in PCI slot 33. This results in a network interface of name ens33 and corresponding network configuration script at /etc/sysconfig/network-scripts/ifcfg-ens33.

When the box is run with Vagrant the required files for the new VM are copied, imported, and created as required. Unfortunately, the setting specifying what PCI slot the network interface was plugged into is not preserved from the original .vmx file bundled with the box. From my Vagrant machines .vmx file:

...
ethernet0.pciSlotNumber = "32"
...

This means my Vagrant VM now has a network interface named ens32. The configuration scripts within the VM (/etc/sysconfig/network-scripts/ifcfg-ens33) are not run since the interface clearly doesn't exist. The error we've all seen is the network.service complaining about the fact that this interface is configured - there is an ifcfg-ens33 file - but it is missing from the system. Since the ens32 interface has no corresponding configuration script nothing happens with it and networking for our VM fails!

Since I create my own boxes, I have been able to get around this problem (and fix the errors seen) by explicitly setting the location the PCI slot the network card uses in both the Packer build configuration file and the Vagrantfile bundled inside the created box. Explicitly setting this within the user configured or VM specific Vagrantfile should work as well. For VMware the relevant setting can be explicitly defined by setting:

...
# VMware provider specific options
config.vm.provider 'vmware_fusion' do |vf|
vf.vmx['ethernet0.pcislotnumber'] = '33'
end
...

I believe settings for Virtualbox can be configured in a similar way. Of course you do need to know the original setting the box was built with but since Vagrant boxes are essentially just tar archives you should be able to untar the box and take a peek at the .vbox or .vmx file to see what the setting should be! This could be used as a possible workaround until this is fixed...

@sethvargo All told, if (!!) I've got all this right then this issue should really be solved by Vagrant reading and using the network card location from the .vbox or .vmx file bundled with the box. This would ensure the network device name is the same as when the box was created and should resolve this issue. Of course, I could have all this wrong and the root cause of the issue may be something else entirely :). However, I can say that what I've done has worked for me...

Hope that helps.

Dan

@DanHam
Copy link

DanHam commented Jul 18, 2016

@peetkes The workaround above works OK for me when using the same bento/centos-7.2 box you are using with VMware. Unfortunately I can't figure out how you set the PCI slot to use for the NIC card with Virtualbox...

The workaround does not work so well for boxes with multiple virtual NICs - it looks like Vagrant expects the default NIC it uses to communicate with the VM to be in slot 32! With a multi NIC VM it looks as though Vagrant blows away the default ifcfg-ens33 with what should be the second NICs network settings...

Fixing the default NIC so it is located in PCI slot 32 in the Packer build configuration file and within the boxes bundled Vagrantfile works well enough and again fixes the issue - even for multi NIC Vagrant machines. Clearly this doesn't help much if you don't have control over the build of the box...

@DanHam
Copy link

DanHam commented Jul 20, 2016

@sethvargo Hi, I'm afraid this hasn't been fixed by the changes made in the 1.8.5 release of Vagrant. I also think that #7514 is related or caused by the same core issue - although it should be noted that the root cause has nothing to do with setting the host name per se.

As stated in previous comments, the error is seen when the network service is restarted. This is because the network service is in a failed state. As such any script restarting the network service will fail (with set -e).

==> default: Setting hostname...
The following SSH command responded with a non-zero exit status.
... blah ...
Restarting network (via systemctl):  [FAILED]
... blah ...
Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

Running journalctl -xe shows us why the network.service is in a failed state:

Jul 20 22:00:58 bento-centos72.localdomain network[11799]: Bringing up interface ens33:  Error: Connection activation failed: No suitable device found for this connection.
Jul 20 22:00:58 bento-centos72.localdomain network[11799]: [FAILED]
... blah ...
-- Subject: Unit network.service has failed

The ens33 device does not exist. Instead we have ens32.

# ls /sys/class/net
ens32  lo

When the Vagrant box was built the Ethernet device was located in PCI slot 33. This can be seen in the .vmx file bundled with the box.

Now we are running the box with Vagrant the Ethernet device is in PCI slot 32. Again this can be seen in the .vmx file under the .vagrant directory or from debug output.

Clearly this means that biosdevname will change the name of the network interface. The ifcfg-ens33 network-script intended to configure the network when the box is brought up is never run since the device is now missing. The end result of all this is the failed network.service.

Again, I've gotten around this issue with my own CentOS boxes by explicitly setting the PCI slot for the network card to 32 (via vmx settings) both in Packers .json file and in the Vagrantfile bundled with the box.

For the bento/centos-7.2 box in question, the issue can be reproduced and worked around with the following Vagrantfile:

#-*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
  # Box
  config.vm.box = 'bento/centos-7.2'
  # Configure the guests hostname
  config.vm.hostname = 'bento-centos72.localdomain'

  # VMware Fusion specific workaround for #7533 and #7514
  # Commenting this section out will reproduce the errors
  config.vm.provider 'vmware_fusion' do |vf|
     vf.vmx['ethernet0.pcislotnumber'] = '33'
  end
end

Obviously this is a workaround and not a fix...

Dan

@DanHam
Copy link

DanHam commented Jul 26, 2016

Apparently this has been known about for some time...

It's mentioned here #4590 and on the chef/bento github page here

Reading some of the comments in #4590 it seems packer hard codes the ethernet0.pcislot - see here

@ghost ghost locked and limited conversation to collaborators Apr 4, 2020
@ghost ghost unassigned sethvargo Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants