Cari di RHE Linux 
    RHE Linux User Manual
Daftar Isi
(Sebelumnya) 26 : Chapter 7. Installing a R ...26 : Common libvirt errors and ... (Berikutnya)

Virtualization Host Configuration and Guest Installation Guide

Chapter 12. PCI device configuration

Red Hat Enterprise Linux 6 exposes three classes of device to its virtual machines:
  • Emulated devices are purely virtual devices that mimic real hardware, allowing unmodified guest operating systems to work with them using their standard in-box drivers.
  • Virtio devices are purely virtual devices designed to work optimally in a virtual machine. Virtio devices are similar to emulated devices, however, non-Linux virtual machines do not include the drivers they require by default. Virtualization management software like the Virtual Machine Manager (virt-manager) and the Red Hat Enterprise Virtualization Hypervisor install these drivers automatically for supported non-Linux guest operating systems.
  • Assigned devices are physical devices that are exposed to the virtual machine. This method is also known as 'passthrough'. Device assignment allows virtual machines exclusive access to PCI devices for a range of tasks, and allows PCI devices to appear and behave as if they were physically attached to the guest operating system.
    Device assignment is supported on PCI Express devices, except graphics cards. Parallel PCI devices may be supported as assigned devices, but they have severe limitations due to security and system configuration conflicts.
Red Hat Enterprise Linux 6 supports 32 PCI device slots per virtual machine, and 8 PCI functions per device slot. This gives a theoretical maximum of 256 configurable PCI functions per guest.
However, this theoretical maximum is subject to the following limitations:
  • Each virtual machine supports a maximum of 8 assigned device functions.
  • 4 PCI device slots are configured with emulated devices by default. However, users can explicitly remove 2 of the emulated devices that are configured by default (the video adapter device in slot 2, and the memory balloon driver device in slot 3). This gives users a supported functional maximum of 30 PCI device slots per virtual machine.
Red Hat Enterprise Linux 6.0 and newer supports hot plugging assigned PCI devices into virtual machines. However, PCI device hot plugging operates at the slot level and therefore does not support multi-function PCI devices. Multi-function PCI devices are recommended for static device configuration only.

Note

Red Hat Enterprise Linux 6.0 limited guest operating system driver access to a device's standard and extended configuration space. Limitations that were present in Red Hat Enterprise Linux 6.0 are significantly reduced in Red Hat Enterprise Linux 6.1, and enable a much larger set of PCI Express devices to be successfully assigned to KVM guests.
Secure device assignment also requires interrupt remapping support. If a platform does not support interrupt remapping, device assignment will fail. To use device assignment without interrupt remapping support in a development environment, set the allow_unsafe_assigned_interrupts KVM module parameter to 1.
PCI device assignment is only available on hardware platforms supporting either Intel VT-d or AMD IOMMU. These Intel VT-d or AMD IOMMU specifications must be enabled in BIOS for PCI device assignment to function.

Procedure 12.1. Preparing an Intel system for PCI device assignment

  1. Enable the Intel VT-d specifications

    The Intel VT-d specifications provide hardware support for directly assigning a physical device to a virtual machine. These specifications are required to use PCI device assignment with Red Hat Enterprise Linux.
    The Intel VT-d specifications must be enabled in the BIOS. Some system manufacturers disable these specifications by default. The terms used to refer to these specifications can differ between manufacturers; consult your system manufacturer's documentation for the appropriate terms.
  2. Activate Intel VT-d in the kernel

    Activate Intel VT-d in the kernel by adding the intel_iommu=on parameter to the kernel line in the /boot/grub/grub.conf file.
    The example below is a modified grub.conf file with Intel VT-d activated.
    default=0timeout=5splashimage=(hd0,0)/grub/splash.xpm.gzhiddenmenutitle Red Hat Enterprise Linux Server (2.6.32-330.x86_645) root (hd0,0) kernel /vmlinuz-2.6.32-330.x86_64 ro root=/dev/VolGroup00/LogVol00 rhgb quiet intel_iommu=on initrd /initrd-2.6.32-330.x86_64.img
  3. Ready to use

    Reboot the system to enable the changes. Your system is now capable of PCI device assignment.

Procedure 12.2. Preparing an AMD system for PCI device assignment

  1. Enable the AMD IOMMU specifications

    The AMD IOMMU specifications are required to use PCI device assignment in Red Hat Enterprise Linux. These specifications must be enabled in the BIOS. Some system manufacturers disable these specifications by default.
  2. Enable IOMMU kernel support

    Append amd_iommu=on to the kernel command line in /boot/grub/grub.conf so that AMD IOMMU specifications are enabled at boot.

12.1. Assigning a PCI device with virsh

These steps cover assigning a PCI device to a virtual machine on a KVM hypervisor.
This example uses a PCIe network controller with the PCI identifier code, pci_0000_01_00_0, and a fully virtualized guest machine named guest1-rhel6-64.

Procedure 12.3. Assigning a PCI device to a guest virtual machine with virsh

  1. Identify the device

    First, identify the PCI device designated for device assignment to the virtual machine. Use the lspci command to list the available PCI devices. You can refine the output of lspci with grep.
    This example uses the Ethernet controller highlighted in the following output:
    # lspci | grep Ethernet00:19.0 Ethernet controller: Intel Corporation 82567LM-2 Gigabit Network Connection01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    This Ethernet controller is shown with the short identifier 00:19.0. We need to find out the full identifier used by virsh in order to assign this PCI device to a virtual machine.
    To do so, combine the virsh nodedev-list command with the grep command to list all devices of a particular type (pci) that are attached to the host machine. Then look at the output for the string that maps to the short identifier of the device you wish to use.
    This example highlights the string that maps to the Ethernet controller with the short identifier 00:19.0. Note that the : and . characters are replaced with underscores in the full identifier.
    # virsh nodedev-list --cap pcipci_0000_00_00_0pci_0000_00_01_0pci_0000_00_03_0pci_0000_00_07_0pci_0000_00_10_0pci_0000_00_10_1pci_0000_00_14_0pci_0000_00_14_1pci_0000_00_14_2pci_0000_00_14_3pci_0000_00_19_0pci_0000_00_1a_0pci_0000_00_1a_1pci_0000_00_1a_2pci_0000_00_1a_7pci_0000_00_1b_0pci_0000_00_1c_0pci_0000_00_1c_1pci_0000_00_1c_4pci_0000_00_1d_0pci_0000_00_1d_1pci_0000_00_1d_2pci_0000_00_1d_7pci_0000_00_1e_0pci_0000_00_1f_0pci_0000_00_1f_2pci_0000_00_1f_3pci_0000_01_00_0pci_0000_01_00_1pci_0000_02_00_0pci_0000_02_00_1pci_0000_06_00_0pci_0000_07_02_0pci_0000_07_03_0
    Record the PCI device number that maps to the device you want to use; this is required in other steps.
  2. Review device information

    Information on the domain, bus, and function are available from output of the virsh nodedev-dumpxml command:
    virsh nodedev-dumpxml pci_0000_00_19_0<device>  <name>pci_0000_00_19_0</name>  <parent>computer</parent>  <driver> <name>e1000e</name>  </driver>  <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x1502'>82579LM Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='virt_functions'> </capability>  </capability></device>
  3. Determine required configuration details

    Refer to the output from the virsh nodedev-dumpxml pci_0000_00_19_0 command for the values required for the configuration file.
    Optionally, convert slot and function values to hexadecimal values (from decimal) to get the PCI bus addresses. Append "0x" to the beginning of the output to tell the computer that the value is a hexadecimal number.
    The example device has the following values: bus = 0, slot = 25 and function = 0. The decimal configuration uses those three values:
    bus='0'slot='25'function='0'
    If you want to convert to hexadecimal values, you can use the printf utility to convert from decimal values, as shown in the following example:
    $ printf %x 00$ printf %x 2519$ printf %x 00
    The example device would use the following hexadecimal values in the configuration file:
    bus='0x0'slot='0x19'function='0x0'
  4. Add configuration details

    Run virsh edit, specifying the virtual machine name, and add a device entry in the <source> section to assign the PCI device to the guest virtual machine.
    # virsh edit guest1-rhel6-64<hostdev mode='subsystem' type='pci' managed='yes'>  <source> <address domain='0x0' bus='0x0' slot='0x19' function='0x0'/>  </source></hostdev>
    Alternately, run virsh attach-device, specifying the virtual machine name and the guest's XML file:
    virsh attach-device guest1-rhel6-64 file.xml
  5. Allow device management

    Set an SELinux boolean to allow the management of the PCI device from the virtual machine:
    # setsebool -P virt_use_sysfs 1
  6. Start the virtual machine

    # virsh start guest1-rhel6-64
The PCI device should now be successfully assigned to the virtual machine, and accessible to the guest operating system.

12.2. Assigning a PCI device with virt-manager

PCI devices can be added to guest virtual machines using the graphical virt-manager tool. The following procedure adds a Gigabit Ethernet controller to a guest virtual machine.

Procedure 12.4. Assigning a PCI device to a guest virtual machine using virt-manager

  1. Open the hardware settings

    Open the guest virtual machine and click the Add Hardware button to add a new device to the virtual machine.
    The virtual machine hardware window with the Information button selected on the top taskbar and Overview selected on the left menu pane.

    Figure 12.1. The virtual machine hardware information window


  2. Select a PCI device

    Select PCI Host Device from the Hardware list on the left.
    Select an unused PCI device. Note that selecting PCI devices presently in use on the host causes errors. In this example, a spare 82576 network device is used. Click Finish to complete setup.
    The Add new virtual hardware wizard with PCI Host Device selected on the left menu pane, showing a list of host devices for selection in the right menu pane.

    Figure 12.2. The Add new virtual hardware wizard


  3. Add the new device

    The setup is complete and the guest virtual machine now has direct access to the PCI device.
    The virtual machine hardware window with the Information button selected on the top taskbar and Overview selected on the left menu pane, displaying the newly added PCI Device in the list of virtual machine devices in the left menu pane.

    Figure 12.3. The virtual machine hardware information window


12.3. PCI device assignment with virt-install

To use virt-install to assign a PCI device, use the --host-device parameter.

Procedure 12.5. Assigning a PCI device to a virtual machine with virt-install

  1. Identify the device

    Identify the PCI device designated for device assignment to the guest virtual machine.
    # lspci | grep Ethernet00:19.0 Ethernet controller: Intel Corporation 82567LM-2 Gigabit Network Connection01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    The virsh nodedev-list command lists all devices attached to the system, and identifies each PCI device with a string. To limit output to only PCI devices, run the following command:
    # virsh nodedev-list --cap pcipci_0000_00_00_0pci_0000_00_01_0pci_0000_00_03_0pci_0000_00_07_0pci_0000_00_10_0pci_0000_00_10_1pci_0000_00_14_0pci_0000_00_14_1pci_0000_00_14_2pci_0000_00_14_3pci_0000_00_19_0pci_0000_00_1a_0pci_0000_00_1a_1pci_0000_00_1a_2pci_0000_00_1a_7pci_0000_00_1b_0pci_0000_00_1c_0pci_0000_00_1c_1pci_0000_00_1c_4pci_0000_00_1d_0pci_0000_00_1d_1pci_0000_00_1d_2pci_0000_00_1d_7pci_0000_00_1e_0pci_0000_00_1f_0pci_0000_00_1f_2pci_0000_00_1f_3pci_0000_01_00_0pci_0000_01_00_1pci_0000_02_00_0pci_0000_02_00_1pci_0000_06_00_0pci_0000_07_02_0pci_0000_07_03_0
    Record the PCI device number; the number is needed in other steps.
    Information on the domain, bus and function are available from output of the virsh nodedev-dumpxml command:
    # virsh nodedev-dumpxml pci_0000_01_00_0<device>  <name>pci_0000_01_00_0</name>  <parent>pci_0000_00_01_0</parent>  <driver> <name>igb</name>  </driver>  <capability type='pci'> <domain>0</domain> <bus>1</bus> <slot>0</slot> <function>0</function> <product id='0x10c9'>82576 Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='virt_functions'> </capability>  </capability></device>
  2. Add the device

    Use the PCI identifier output from the virsh nodedev command as the value for the --host-device parameter.
    virt-install \--name=guest1-rhel6-64 \--disk path=/var/lib/libvirt/images/guest1-rhel6-64.img,size=8 \--nonsparse --graphics spice \--vcpus=2 --ram=2048 \--location=http://example1.com/installation_tree/RHEL6.1-Server-x86_64/os \--nonetworks \--os-type=linux \--os-variant=rhel6--host-device=pci_0000_01_00_0
  3. Complete the installation

    Complete the guest installation. The PCI device should be attached to the guest.

12.4. Detaching an assigned PCI device

When a host PCI device has been assigned to a guest machine, the host can no longer use the device. Read this section to learn how to detach the device from the guest with virsh or virt-manager so it is available for host use.

Procedure 12.6. Detaching a PCI device from a guest with virsh

  1. Detach the device

    Use the following command to detach the PCI device from the guest by removing it in the guest's XML file:
    # virsh detach-device name_of_guest file.xml
  2. Re-attach the device to the host (optional)

    If the device is in managed mode, skip this step. The device will be returned to the host automatically.
    If the device is not using managed mode, use the following command to re-attach the PCI device to the host machine:
    # virsh nodedev-reattach device
    For example, to re-attach the pci_0000_01_00_0 device to the host:
    virsh nodedev-reattach pci_0000_01_00_0
    The device is now available for host use.

Procedure 12.7. Detaching a PCI Device from a guest with virt-manager

  1. Open the virtual hardware details screen

    In virt-manager, double-click on the virtual machine that contains the device. Select the Show virtual hardware details button to display a list of virtual hardware.
    The Show virtual hardware details button.

    Figure 12.4. The virtual hardware details button


  2. Select and remove the device

    Select the PCI device to be detached from the list of virtual devices in the left panel.
    The PCI device details and the Remove button.

    Figure 12.5. Selecting the PCI device to be detached


    Click the Remove button to confirm. The device is now available for host use.

Chapter 13. SR-IOV

13.1. Introduction

Developed by the PCI-SIG (PCI Special Interest Group), the Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device to multiple virtual machines. SR-IOV improves device performance for virtual machines.
How SR-IOV works

Figure 13.1. How SR-IOV works


SR-IOV enables a Single Root Function (for example, a single Ethernet port), to appear as multiple, separate, physical devices. A physical device with SR-IOV capabilities can be configured to appear in the PCI configuration space as multiple functions. Each device has its own configuration space complete with Base Address Registers (BARs).
SR-IOV uses two PCI functions:
  • Physical Functions (PFs) are full PCIe devices that include the SR-IOV capabilities. Physical Functions are discovered, managed, and configured as normal PCI devices. Physical Functions configure and manage the SR-IOV functionality by assigning Virtual Functions.
  • Virtual Functions (VFs) are simple PCIe functions that only process I/O. Each Virtual Function is derived from a Physical Function. The number of Virtual Functions a device may have is limited by the device hardware. A single Ethernet port, the Physical Device, may map to many Virtual Functions that can be shared to virtual machines.
The hypervisor can map one or more Virtual Functions to a virtual machine. The Virtual Function's configuration space is then mapped to the configuration space presented to the guest.
Each Virtual Function can only be mapped to a single guest at a time, as Virtual Functions require real hardware resources. A virtual machine can have multiple Virtual Functions. A Virtual Function appears as a network card in the same way as a normal network card would appear to an operating system.
The SR-IOV drivers are implemented in the kernel. The core implementation is contained in the PCI subsystem, but there must also be driver support for both the Physical Function (PF) and Virtual Function (VF) devices. An SR-IOV capable device can allocate VFs from a PF. The VFs appear as PCI devices which are backed on the physical PCI device by resources such as queues and register sets.
Advantages of SR-IOV
SR-IOV devices can share a single physical port with multiple virtual machines.
Virtual Functions have near-native performance and provide better performance than para-virtualized drivers and emulated access. Virtual Functions provide data protection between virtual machines on the same physical server as the data is managed and controlled by the hardware.
These features allow for increased virtual machine density on hosts within a data center.
SR-IOV is better able to utilize the bandwidth of devices with multiple guests.

13.2. Using SR-IOV

This section covers the use of PCI passthrough to assign a Virtual Function of an SR-IOV capable multiport network card to a virtual machine as a network device.
SR-IOV Virtual Functions (VFs) can be assigned to virtual machines by adding a device entry in <hostdev> with the virsh edit or virsh attach-device command. However, this can be problematic because unlike a regular network device, an SR-IOV VF network device does not have a permanent unique MAC address, and is assigned a new MAC address each time the host is rebooted. Because of this, even if the guest is assigned the same VF after a reboot, when the host is rebooted the guest determines its network adapter to have a new MAC address. As a result, the guest believes there is new hardware connected each time, and will usually require re-configuration of the guest's network settings.
libvirt-0.9.10 and newer contains the <interface type='hostdev'> interface device. Using this interface device, libvirt will first perform any network-specific hardware/switch initialization indicated (such as setting the MAC address, VLAN tag, or 802.1Qbh virtualport parameters), then perform the PCI device assignment to the guest.
Using the <interface type='hostdev'> interface device requires:
  • an SR-IOV-capable network card,
  • host hardware that supports either the Intel VT-d or the AMD IOMMU extensions, and
  • the PCI address of the VF to be assigned.

Important

Assignment of an SR-IOV device to a virtual machine requires that the host hardware supports the Intel VT-d or the AMD IOMMU specification.
To attach an SR-IOV network device on an Intel or an AMD system, follow this procedure:

Procedure 13.1. Attach an SR-IOV network device on an Intel or AMD system

  1. Enable Intel VT-d or the AMD IOMMU specifications in the BIOS and kernel

    On an Intel system, enable Intel VT-d in the BIOS if it is not enabled already. Refer to Procedure 12.1, "Preparing an Intel system for PCI device assignment" for procedural help on enabling Intel VT-d in the BIOS and kernel.
    Skip this step if Intel VT-d is already enabled and working.
    On an AMD system, enable the AMD IOMMU specifications in the BIOS if they are not enabled already. Refer to Procedure 12.2, "Preparing an AMD system for PCI device assignment" for procedural help on enabling IOMMU in the BIOS.
  2. Verify support

    Verify if the PCI device with SR-IOV capabilities is detected. This example lists an Intel 82576 network interface card which supports SR-IOV. Use the lspci command to verify whether the device was detected.
    # lspci03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    Note that the output has been modified to remove all other devices.
  3. Start the SR-IOV kernel modules

    If the device is supported the driver kernel module should be loaded automatically by the kernel. Optional parameters can be passed to the module using the modprobe command. The Intel 82576 network interface card uses the igb driver kernel module.
    # modprobe igb [<option>=<VAL1>,<VAL2>,]# lsmod |grep igbigb 87592  0dca 6708 1 igb
  4. Activate Virtual Functions

    The max_vfs parameter of the igb module allocates the maximum number of Virtual Functions. The max_vfs parameter causes the driver to spawn, up to the value of the parameter in, Virtual Functions. For this particular card the valid range is 0 to 7.
    Remove the module to change the variable.
    # modprobe -r igb
    Restart the module with the max_vfs set to 7 or any number of Virtual Functions up to the maximum supported by your device.
    # modprobe igb max_vfs=7
  5. Make the Virtual Functions persistent

    Add the line options igb max_vfs=7 to any file in /etc/modprobe.d to make the Virtual Functions persistent. For example:
    # echo "options igb max_vfs=7" >>/etc/modprobe.d/igb.conf
  6. Inspect the new Virtual Functions

    Using the lspci command, list the newly added Virtual Functions attached to the Intel 82576 network device. (Alternatively, use grep to search for Virtual Function, to search for devices that support Virtual Functions.)
    # lspci | grep 825760b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    The identifier for the PCI device is found with the -n parameter of the lspci command. The Physical Functions correspond to 0b:00.0 and 0b:00.1. All Virtual Functions have Virtual Function in the description.
  7. Verify devices exist with virsh

    The libvirt service must recognize the device before adding a device to a virtual machine. libvirt uses a similar notation to the lspci output. All punctuation characters, ; and ., in lspci output are changed to underscores (_).
    Use the virsh nodedev-list command and the grep command to filter the Intel 82576 network device from the list of available host devices. 0b is the filter for the Intel 82576 network devices in this example. This may vary for your system and may result in additional devices.
    # virsh nodedev-list | grep 0bpci_0000_0b_00_0pci_0000_0b_00_1pci_0000_0b_10_0pci_0000_0b_10_1pci_0000_0b_10_2pci_0000_0b_10_3pci_0000_0b_10_4pci_0000_0b_10_5pci_0000_0b_10_6pci_0000_0b_11_7pci_0000_0b_11_1pci_0000_0b_11_2pci_0000_0b_11_3pci_0000_0b_11_4pci_0000_0b_11_5
    The serial numbers for the Virtual Functions and Physical Functions should be in the list.
  8. Get device details with virsh

    The pci_0000_0b_00_0 is one of the Physical Functions and pci_0000_0b_10_0 is the first corresponding Virtual Function for that Physical Function. Use the virsh nodedev-dumpxml command to get advanced output for both devices.
    # virsh nodedev-dumpxml pci_0000_0b_00_0<device>   <name>pci_0000_0b_00_0</name>   <parent>pci_0000_00_01_0</parent>   <driver>  <name>igb</name>   </driver>   <capability type='pci'>  <domain>0</domain>  <bus>11</bus>  <slot>0</slot>  <function>0</function>  <product id='0x10c9'>Intel Corporation</product>  <vendor id='0x8086'>82576 Gigabit Network Connection</vendor>   </capability></device>
    # virsh nodedev-dumpxml pci_0000_0b_10_0<device>   <name>pci_0000_0b_10_0</name>   <parent>pci_0000_00_01_0</parent>   <driver>  <name>igbvf</name>   </driver>   <capability type='pci'>  <domain>0</domain>  <bus>11</bus>  <slot>16</slot>  <function>0</function>  <product id='0x10ca'>Intel Corporation</product>  <vendor id='0x8086'>82576 Virtual Function</vendor>   </capability></device>
    This example adds the Virtual Function pci_0000_0b_10_0 to the virtual machine in Step 9. Note the bus, slot and function parameters of the Virtual Function: these are required for adding the device.
    Copy these parameters into a temporary XML file, such as /tmp/new-interface.xml for example.
       <interface type='hostdev' managed='yes'> <source>   <address type='pci' domain='0' bus='11' slot='16' function='0'/> </source>   </interface>

    Note

    If you do not specify a MAC address, one will be automatically generated. The <virtualport> element is only used when connecting to an 802.11Qbh hardware switch. The <vlan> element is new for Red Hat Enterprise Linux 6.4 and this will transparently put the guest's device on the VLAN tagged 42.
    When the virtual machine starts, it should see a network device of the type provided by the physical adapter, with the configured MAC address. This MAC address will remain unchanged across host and guest reboots.
    The following <interface> example shows the syntax for the optional <mac address>, <virtualport>, and <vlan> elements. In practice, use either the <vlan> or <virtualport> element, not both simultaneously as shown in the example:
    ... <devices>   ...   <interface type='hostdev' managed='yes'> <source>   <address type='pci' domain='0' bus='11' slot='16' function='0'/> </source> <mac address='52:54:00:6d:90:02'> <vlan> <tag id='42'/> </vlan> <virtualport type='802.1Qbh'>   <parameters profileid='finance'/> </virtualport>   </interface>   ... </devices>
  9. Add the Virtual Function to the virtual machine

    Add the Virtual Function to the virtual machine using the following command with the temporary file created in the previous step. This attaches the new device immediately and saves it for subsequent guest restarts.
    virsh attach-device MyGuest /tmp/new-interface.xml --live --config
    Specifying the --live option with virsh attach-device attaches the new device to the running guest. Using the --config option ensures the new device is available after future guest restarts.

    Note

    The --live option is only accepted when the guest is running. virsh will return an error if the --live option is used on a non-running guest.
The virtual machine detects a new network interface card. This new card is the Virtual Function of the SR-IOV device.

13.3. Troubleshooting SR-IOV

This section contains solutions for problems which may affect SR-IOV.
Error starting the guest
When starting a configured virtual machine, an error occurs as follows:
# virsh start testerror: Failed to start domain testerror: internal error unable to start guest: char device redirected to/dev/pts/2get_real_device: /sys/bus/pci/devices/0000:03:10.0/config: Permission deniedinit_assigned_device: Error: Couldn't get real device (03:10.0)!Failed to initialize assigned device host=03:10.0
This error is often caused by a device that is already assigned to another guest or to the host itself.
Error migrating, saving, or dumping the guest
Attempts to migrate and dump the virtual machine cause an error similar to the following:
# virsh dump --crash 5 /tmp/vmcoreerror: Failed to core dump domain 5 to /tmp/vmcoreerror: internal error unable to execute QEMU command 'migrate': An undefinederror has occurred
Because device assignment uses hardware on the specific host where the virtual machine was started, guest migration and save are not supported when device assignment is in use. Currently, the same limitation also applies to core-dumping a guest; this may change in the future.

Chapter 14. KVM guest timing management

Virtualization involves several intrinsic challenges for time keeping in guest virtual machines. Interrupts cannot always be delivered simultaneously and instantaneously to all guest virtual machines, because interrupts in virtual machines are not true interrupts; they are injected into the guest virtual machine by the host machine. The host may be running another guest virtual machine, or a different process, meaning that the precise timing typically required by interrupts may not always be possible.
Guest virtual machines without accurate time keeping may experience issues with network applications and processes, as session validity, migration, and other network activities rely on timestamps to remain correct.
KVM avoids these issues by providing guest virtual machines with a para-virtualized clock (kvm-clock). However, it is still vital to test timing before attempting activities that may be affected by time keeping inaccuracies.

Note

Red Hat Enterprise Linux 5.5 and newer, and Red Hat Enterprise Linux 6.0 and newer, use kvm-clock as their default clock source. Running without kvm-clock requires special configuration, and is not recommended.

Important

The Network Time Protocol (NTP) daemon should be running on the host and the guest virtual machines. Enable the ntpd service:
# service ntpd start
Add the ntpd service to the default startup sequence:
# chkconfig ntpd on
The ntpd service will correct the effects of clock skew as long as the clock runs no more than 0.05% faster or slower than the reference time source. The ntp startup script adjusts the clock offset from the reference time by adjusting the system clock at startup time, if required.
Constant Time Stamp Counter (TSC)
Modern Intel and AMD CPUs provide a constant Time Stamp Counter (TSC). The count frequency of the constant TSC does not vary when the CPU core itself changes frequency, for example, to comply with a power saving policy. A CPU with a constant TSC frequency is necessary in order to use the TSC as a clock source for KVM guests.
Your CPU has a constant Time Stamp Counter if the constant_tsc flag is present. To determine if your CPU has the constant_tsc flag run the following command:
$ cat /proc/cpuinfo | grep constant_tsc
If any output is given your CPU has the constant_tsc bit. If no output is given follow the instructions below.
Configuring hosts without a constant Time Stamp Counter
Systems without a constant TSC frequency cannot use the TSC as a clock source for virtual machines, and require additional configuration. Power management features interfere with accurate time keeping and must be disabled for guest virtual machines to accurately keep time with KVM.

Important

These instructions are for AMD revision F CPUs only.
If the CPU lacks the constant_tsc bit, disable all power management features (BZ#513138). Each system has several timers it uses to keep time. The TSC is not stable on the host, which is sometimes caused by cpufreq changes, deep C state, or migration to a host with a faster TSC. Deep C sleep states can stop the TSC. To prevent the kernel using deep C states append processor.max_cstate=1 to the kernel boot options in the grub.conf file on the host:
title Red Hat Enterprise Linux (2.6.32-330.x86_64) root (hd0,0)kernel /vmlinuz-2.6.32-330.x86_64 ro root=/dev/VolGroup00/LogVol00 rhgb quiet \   processor.max_cstate=1
Disable cpufreq (only necessary on hosts without the constant_tsc) by editing the /etc/sysconfig/cpuspeed configuration file and change the MIN_SPEED and MAX_SPEED variables to the highest frequency available. Valid limits can be found in the /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies files.
Required parameters for Red Hat Enterprise Linux guests
For certain Red Hat Enterprise Linux guest virtual machines, additional kernel parameters are required. These parameters can be set by appending them to the end of the /kernel line in the /boot/grub/grub.conf file of the guest virtual machine.
The table below lists versions of Red Hat Enterprise Linux and the parameters required on the specified systems.

Table 14.1. Kernel parameter requirements

Red Hat Enterprise Linux versionAdditional guest kernel parameters
6.0 AMD64/Intel 64 with the para-virtualized clockAdditional parameters are not required
6.0 AMD64/Intel 64 without the para-virtualized clocknotsc lpj=n
5.5 AMD64/Intel 64 with the para-virtualized clockAdditional parameters are not required
5.5 AMD64/Intel 64 without the para-virtualized clocknotsc lpj=n
5.5 x86 with the para-virtualized clockAdditional parameters are not required
5.5 x86 without the para-virtualized clockclocksource=acpi_pm lpj=n
5.4 AMD64/Intel 64notsc
5.4 x86clocksource=acpi_pm
5.3 AMD64/Intel 64notsc
5.3 x86clocksource=acpi_pm
4.8 AMD64/Intel 64notsc
4.8 x86clock=pmtmr
3.9 AMD64/Intel 64Additional parameters are not required
3.9 x86Additional parameters are not required

Note

The lpj parameter requires a numeric value equal to the loops per jiffy value of the specific CPU on which the guest virtual machine runs. If you do not know this value, do not set the lpj parameter.

Warning

The divider kernel parameter was previously recommended for Red Hat Enterprise Linux 4 and 5 guest virtual machines that did not have high responsiveness requirements, or exist on systems with high guest density. It is no longer recommended for use with guests running Red Hat Enterprise Linux 4, or Red Hat Enterprise Linux 5 versions prior to version 5.8.
divider can improve throughput on Red Hat Enterprise Linux 5 versions equal to or later than 5.8 by lowering the frequency of timer interrupts. For example, if HZ=1000, and divider is set to 10 (that is, divider=10), the number of timer interrupts per period changes from the default value (1000) to 100 (the default value, 1000, divided by the divider value, 10).
BZ#698842 details a bug in the way that the divider parameter interacts with interrupt and tick recording. This bug is fixed as of Red Hat Enterprise Linux 5.8. However, the divider parameter can still cause kernel panic in guests using Red Hat Enterprise Linux 4, or Red Hat Enterprise Linux 5 versions prior to version 5.8.
This parameter was not implemented in Red Hat Enterprise Linux 3, so this bug does not affect Red Hat Enterprise Linux 3 guests.
Red Hat Enterprise Linux 6 does not have a fixed-frequency clock interrupt; it operates in tickless mode and uses the timer dynamically as required. The divider parameter is therefore not useful for Red Hat Enterprise Linux 6, and Red Hat Enterprise Linux 6 guests are not affected by this bug.
Using the Real-Time Clock with Windows Server 2003 and Windows XP guests
Windows uses the both the Real-Time Clock (RTC) and the Time Stamp Counter (TSC). For Windows guest virtual machines the Real-Time Clock can be used instead of the TSC for all time sources which resolves guest timing issues.
To enable the Real-Time Clock for the PMTIMER clock source (the PMTIMER usually uses the TSC), add the following option to the Windows boot settings. Windows boot settings are stored in the boot.ini file. Add the following option to the end of the Windows boot line in the boot.ini file:
/usepmtimer
For more information on Windows boot settings and the usepmtimer option, refer to Available switch options for the Windows XP and the Windows Server 2003 Boot.ini files.
Using the Real-Time Clock with Windows Server 2008, Windows Server 2008 R2, and Windows 7 guests
Windows uses the both the Real-Time Clock (RTC) and the Time Stamp Counter (TSC). For Windows guest virtual machines the Real-Time Clock can be used instead of the TSC for all time sources, which resolves guest timing issues.
The boot.ini file is no longer used as of Windows Server 2008 and newer. Windows Server 2008, Windows Server 2008 R2, and Windows 7 do not use the TSC as a time source if the hypervisor-present bit is set. The Red Hat Enterprise Linux 6 KVM hypervisor enables this CPUID bit by default, so it is no longer necessary to use the Boot Configuration Data Editor (bcdedit.exe) to modify the Windows boot parameters.
  1. Open the Windows guest virtual machine.
  2. Open the Accessories menu of the start menu. Right click on the Command Prompt application, select Run as Administrator.
  3. Confirm the security exception, if prompted.
  4. Set the boot manager to use the platform clock. This should instruct Windows to use the PM timer for the primary clock source. The system UUID ({default} in the example below) should be changed if the system UUID is different than the default boot device.
    C:\Windows\system32>bcdedit /set {default} USEPLATFORMCLOCK onThe operation completed successfully
This fix should improve time keeping for Windows Server 2008 and Windows 7 guests.
Steal time accounting
Steal time is the amount of CPU time desired by a guest virtual machine that is not provided by the host. Steal time occurs when the host allocates these resources elsewhere: for example, to another guest.
Steal time is reported in the CPU time fields in /proc/stat as st. It is automatically reported by utilities such as top and vmstat, and cannot be switched off.
Large amounts of steal time indicate CPU contention, which can reduce guest performance. To relieve CPU contention, increase the guest's CPU priority or CPU quota, or run fewer guests on the host.

Chapter 15. Network booting with libvirt

Guest virtual machines can be booted with PXE enabled. PXE allows guest virtual machines to boot and load their configuration off the network itself. This section demonstrates some basic configuration steps to configure PXE guests with libvirt.
This section does not cover the creation of boot images or PXE servers. It is used to explain how to configure libvirt, in a private or bridged network, to boot a guest virtual machine with PXE booting enabled.

Warning

These procedures are provided only as an example. Ensure that you have sufficient backups before proceeding.

15.1. Preparing the boot server

To perform the steps in this chapter you will need:
  • A PXE Server (DHCP and TFTP) - This can be a libvirt internal server, manually-configured dhcpd and tftpd, dnsmasq, a server configured by Cobbler, or some other server.
  • Boot images - for example, PXELINUX configured manually or by Cobbler.

15.1.1. Setting up a PXE boot server on a private libvirt network

This example uses the default network. Perform the following steps:

Procedure 15.1. Configuring the PXE boot server

  1. Place the PXE boot images and configuration in /var/lib/tftp.
  2. Run the following commands:
    # virsh net-destroy default# virsh net-edit default
  3. Edit the <ip> element in the configuration file for the default network to include the appropriate address, network mask, DHCP address range, and boot file, where BOOT_FILENAME represents the file name you are using to boot the guest virtual machine.
    <ip address='192.168.122.1' netmask='255.255.255.0'>   <tftp root='/var/lib/tftp' />   <dhcp>  <range start='192.168.122.2' end='192.168.122.254' />  <bootp file='BOOT_FILENAME' />   </dhcp></ip>
  4. Boot the guest using PXE (refer to Section 15.2, "Booting a guest using PXE").

15.2. Booting a guest using PXE

This section demonstrates how to boot a guest virtual machine with PXE.

15.2.1. Using bridged networking

Procedure 15.2. Booting a guest using PXE and bridged networking

  1. Ensure bridging is enabled such that the PXE boot server is available on the network.
  2. Boot a guest virtual machine with PXE booting enabled. You can use the virt-install command to create a new virtual machine with PXE booting enabled, as shown in the following example command:
    virt-install --pxe --network bridge=breth0 --prompt
    Alternatively, ensure that the guest network is configured to use your bridged network, and that the XML guest configuration file has a <boot dev='network'/> element inside the <os> element, as shown in the following example:
    <os>   <type arch='x86_64' machine='rhel6.2.0'>hvm</type>   <boot dev='network'/>   <boot dev='hd'/></os><interface type='bridge'>  <mac address='52:54:00:5a:ad:cb'/>   <source bridge='breth0'/> <target dev='vnet0'/>   <alias name='net0'/>   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/></interface>

15.2.2. Using a private libvirt network

Procedure 15.3. Using a private libvirt network

  1. Configure PXE booting on libvirt as shown in Section 15.1.1, "Setting up a PXE boot server on a private libvirt network".
  2. Boot a guest virtual machine using libvirt with PXE booting enabled. You can use the virt-install command to create/install a new virtual machine using PXE:
    virt-install --pxe --network network=default --prompt
Alternatively, ensure that the guest network is configured to use your bridged network, and that the XML guest configuration file has a <boot dev='network'/> element inside the <os> element, as shown in the following example:
<os>   <type arch='x86_64' machine='rhel6.2.0'>hvm</type>   <boot dev='network'/> <boot dev='hd'/></os>
Also ensure that the guest virtual machine is connected to the private network:
<interface type='network'> <mac address='52:54:00:66:79:14'/>   <source network='default'/> <target dev='vnet0'/>   <alias name='net0'/>   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/></interface>

Chapter 16. QEMU Guest Agent

The QEMU Guest Agent allows the host machine to issue commands to the guest operating system. The guest operating system then responds to those commands asynchronously.
This section covers the options and commands available to the guest agent in detail. It also covers how to run the guest agent in the foreground, or as a daemon in the background.

16.1. Set Up Communication between Guest Agent and Host

The host machine communicates with the guest agent through a VirtIO serial connection between the host and guest machines. A VirtIO serial channel is connected to the host via a character device driver (typically a Unix socket), and the guest listens on this serial channel. The following procedure shows how to set up the host and guest machines for guest agent use.

Procedure 16.1. Set Up Host-Agent Communication

  1. Launch QEMU with a character device driver

    Launch QEMU as usual, with additional definitions for the character device driver required to communicate with the guest agent.
    The following example launches QEMU to communicate over the Unix socket /tmp/qga.sock.
    /usr/libexec/qemu-kvm [...] -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ -device virtio-serial \ -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
  2. Start the Guest Agent

    On the guest, run the following command to start the Guest Agent:
    qemu-ga --path device_path --method method
    The guest agent now parses incoming QMP messages for commands, and acts upon them if valid.
    If no other method or path is specified with the --method or --path options respectively, the Guest Agent listens over virtio-serial, through the /dev/virtio-ports/org.qemu.guest_agent.0 device path.
You can now communicate with the guest by sending valid QMP commands over the established character device driver.
For further information about the guest agent, refer to the Red Hat Enterprise Linux 6 Virtualization Administration Guide.

NetKVM Driver Parameters

After the NetKVM driver is installed, you can configure it to better suit your environment. The parameters listed in this section can be configured in the Windows Device Manager (devmgmt.msc).

Important

Modifying the driver's parameters causes Windows to re-load that driver. This interrupts existing network activity.

Procedure A.1. Configuring NetKVM Parameters

  1. Open Device Manager

    Click on the Start button. In the right-hand pane, right-click on Computer, and click Manage. If prompted, click Continue on the User Account Control window. This opens the Computer Management window.
    In the left-hand pane of the Computer Management window, click Device Manager.
  2. Locate the correct device

    In the central pane of the Computer Management window, click on the + symbol beside Network adapters.
    Under the list of Red Hat VirtIO Ethernet Adapter devices, double-click on NetKVM. This opens the Properties window for that device.
  3. View device parameters

    In the Properties window, click on the Advanced tab.
  4. Modify device parameters

    Click on the parameter you wish to modify to display the options for that parameter.
    Modify the options as appropriate, then click on OK to save your changes.

A.1. Configurable parameters for NetKVM

Logging parameters

Logging.Enable
A Boolean value that determines whether logging is enabled. The default value is 1 (enabled).
Logging.Level
An integer that defines the logging level. As the integer increases, so does the verbosity of the log. The default value is 0 (errors only). 1-2 adds configuration messages. 3-4 adds packet flow information. 5-6 adds interrupt and DPC level trace information.

Important

High logging levels will slow down your guest virtual machine.
Logging.Statistics(sec)
An integer that defines whether log statistics are printed, and the time in seconds between each periodical statistics printout. The default value is 0 (no logging statistics).

Initial parameters

Assign MAC
A string that defines the locally-administered MAC address for the para-virtualized NIC. This is not set by default.
Init.ConnectionRate(Mb)
An integer that represents the connection rate in megabytes. The default value for Windows 2008 and later is 10000.
Init.Do802.1PQ
A Boolean value that enables Priority/VLAN tag population and removal support. The default value is 1 (enabled).
Init.UseMergedBuffers
A Boolean value that enables merge-able RX buffers. The default value is 1 (enabled).
Init.UsePublishEvents
A Boolean value that enables published event use. The default value is 1 (enabled).
Init.MTUSize
An integer that defines the maximum transmission unit (MTU). The default value is 1500. Any value from 500 to 65500 is acceptable.
Init.IndirectTx
Controls whether indirect ring descriptors are in use. The default value is Disable, which disables use of indirect ring descriptors. Other valid values are Enable, which enables indirect ring descriptor usage; and Enable*, which enables conditional use of indirect ring descriptors.
Init.MaxTxBuffers
An integer that represents the amount of TX ring descriptors that will be allocated. The default value is 1024. Valid values are: 16, 32, 64, 128, 256, 512, or 1024.
Init.MaxRxBuffers
An integer that represents the amount of RX ring descriptors that will be allocated. The default value is 256. Valid values are: 16, 32, 64, 128, 256, 512, or 1024.
Offload.Tx.Checksum
Specifies the TX checksum offloading mode.
In Red Hat Enterprise Linux 6.4 and onward, the valid values for this parameter are All (the default), which enables IP, TCP and UDP checksum offloading for both IPv4 and IPv6; TCP/UDP(v4,v6), which enables TCP and UDP checksum offloading for both IPv4 and IPv6; TCP/UDP(v4), which enables TCP and UDP checksum offloading for IPv4 only; and TCP(v4), which enables only TCP checksum offloading for IPv4 only.
In Red Hat Enterprise Linux 6.3 and earlier, the valid values for this parameter are TCP/UDP (the default value), which enables TCP and UDP checksum offload; TCP, which enables only TCP checksum offload; or Disable, which disables TX checksum offload.
Offload.Tx.LSO
A Boolean value that enables TX TCP Large Segment Offload (LSO). The default value is 1 (enabled).
Offload.Rx.Checksum
Specifies the RX checksum offloading mode.
In Red Hat Enterprise Linux 6.4 and onward, the valid values for this parameter are All (the default), which enables IP, TCP and UDP checksum offloading for both IPv4 and IPv6; TCP/UDP(v4,v6), which enables TCP and UDP checksum offloading for both IPv4 and IPv6; TCP/UDP(v4), which enables TCP and UDP checksum offloading for IPv4 only; and TCP(v4), which enables only TCP checksum offloading for IPv4 only.
In Red Hat Enterprise Linux 6.3 and earlier, the valid values are Disable (the default), which disables RX checksum offloading; All, which enables TCP, UDP, and IP checksum offloading; TCP/UDP, which enables TCP and UDP checksum offloading; and TCP, which enables only TCP checksum offloading.

Test and debug parameters

Important

Test and debug parameters should only be used for testing or debugging; they should not be used in production.
TestOnly.DelayConnect(ms)
The period for which to delay connection upon startup, in milliseconds. The default value is 0.
TestOnly.DPCChecking
Sets the DPC checking mode. 0 (the default) disables DPC checking. 1 enables DPC checking; each hang test verifies DPC activity and acts as if the DPC was spawned. 2 clears the device interrupt status and is otherwise identical to 1.
TestOnly.Scatter-Gather
A Boolean value that determines whether scatter-gather functionality is enabled. The default value is 1 (enabled). Setting this value to 0 disables scatter-gather functionality and all dependent capabilities.
TestOnly.InterruptRecovery
A Boolean value that determines whether interrupt recovery is enabled. The default value is 1 (enabled).
TestOnly.PacketFilter
A Boolean value that determines whether packet filtering is enabled. The default value is 1 (enabled).
TestOnly.BatchReceive
A Boolean value that determines whether packets are received in batches, or singularly. The default value is 1, which enables batched packet receipt.
TestOnly.Promiscuous
A Boolean value that determines whether promiscuous mode is enabled. The default value is 0 (disabled).
TestOnly.AnalyzeIPPackets
A Boolean value that determines whether the checksum fields of outgoing IP packets are tested and verified for debugging purposes. The default value is 0 (no checking).
TestOnly.RXThrottle
An integer that determines the number of receive packets handled in a single DPC. The default value is 1000.
TestOnly.UseSwTxChecksum
A Boolean value that determines whether hardware checksumming is enabled. The default value is 0 (disabled).
(Sebelumnya) 26 : Chapter 7. Installing a R ...26 : Common libvirt errors and ... (Berikutnya)