Cari di RHE Linux 
    RHE Linux User Manual
Daftar Isi
(Sebelumnya) 24 : Chapter 5. Remote managem ...24 : Chapter 11. Storage pools (Berikutnya)

Virtualization Administration Guide

Chapter 9. Miscellaneous administration tasks

This chapter contain useful hints and tips to improve virtualization performance, scale and stability.

9.1. Automatically starting guests

This section covers how to make guests start automatically during the host system's boot phase.
This example uses virsh to set a guest, TestServer, to automatically start when the host boots.
# virsh autostart TestServerDomain TestServer marked as autostarted
The guest now automatically starts with the host.
To stop a guest automatically booting use the --disable parameter
# virsh autostart --disable TestServerDomain TestServer unmarked as autostarted
The guest no longer automatically starts with the host.

9.2. Guest memory allocation

The following procedure shows how to allocate memory for a guest. This allocation and assignement works only at boot time and any changes to any of the memory values will not take effect until the next reboot.
Valid memory units include:
  • b or bytes for bytes
  • KB for kilobytes (103 or blocks of 1,000 bytes)
  • k or KiB for kibibytes (210 or blocks of 1024 bytes)
  • MB for megabytes (106 or blocks of 1,000,000 bytes)
  • M or MiB for mebibytes (220 or blocks of 1,048,576 bytes)
  • GB for gigabytes (109 or blocks of 1,000,000,000 bytes)
  • G or GiB for gibibytes (230 or blocks of 1,073,741,824 bytes)
  • TB for terabytes (1012 or blocks of 1,000,000,000,000 bytes)
  • T or TiB for tebibytes (240 or blocks of 1,099,511,627,776 bytes)
Note that all values will be rounded up to the nearest kibibyte by libvirt, and may be further rounded to the granularity supported by the hypervisor. Some hypervisors also enforce a minimum, such as 4000KiB (or 4000 x 210 or 4,096,000 bytes). The units for this value are determined by the optional attribute memory unit, which defaults to the kibibytes (KiB) as a unit of measure where the value given is multiplied by 210 or blocks of 1024 bytes.
In the cases where the guest crashes the optional attribute dumpCore can be used to control whether the guest's memory should be included in the generated coredump (dumpCore='on') or not included (dumpCore='off'). Note that the default setting is on so if the parameter is not set to off, the guest memory will be included in the coredump file.
The currentMemory attribute determines the actual memory allocation for a guest. This value can be less than the maximum allocation, to allow for ballooning up the guests memory on the fly. If this is omitted, it defaults to the same value as the memory element. The unit attribute behaves the same as for memory.
In all cases for this section, the domain XML needs to be altered as follows:
<domain> <memory unit='KiB' dumpCore='off'>524288</memory>  <!-- changes the memory unit to KiB and does not allow the guest's memory to be included in the generated coredump file -->  <currentMemory unit='KiB'>524288</currentMemory>  <!-- makes the current memory unit 524288 KiB -->  ...</domain>

9.3. Using qemu-img

The qemu-img command line tool is used for formatting, modifying and verifying various file systems used by KVM. qemu-img options and usages are listed below.
Check
Perform a consistency check on the disk image filename.
# qemu-img check [-f format] filename

Note

Only the qcow2 and vdi formats support consistency checks.
Commit
Commit any changes recorded in the specified file (filename) to the file's base image with the qemu-img commit command. Optionally, specify the file's format type (fmt).
 # qemu-img commit [-f fmt] [-t cache] filename
Convert
The convert option is used to convert one recognized image format to another image format.
Command format:
# qemu-img convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-S sparse_size] filename output_filename
The -p parameter shows the progress of the command (optional and not for every command) and -S indicates the consecutive number of bytes that must contain only zeros for qemu-img to create a sparse image during conversion.
Convert the disk image filename to disk image output_filename using format output_format. The disk image can be optionally compressed with the -c option, or encrypted with the -o option by setting -o encryption. Note that the options available with the -o parameter differ with the selected format.
Only the qcow2 format supports encryption or compression. qcow2 encryption uses the AES format with secure 128-bit keys. qcow2 compression is read-only, so if a compressed sector is converted from qcow2 format, it is written to the new format as uncompressed data.
Image conversion is also useful to get a smaller image when using a format which can grow, such as qcow or cow. The empty sectors are detected and suppressed from the destination image.
Create
Create the new disk image filename of size size and format format.
# qemu-img create [-f format] [-o options] filename [size]
If a base image is specified with -o backing_file=filename, the image will only record differences between itself and the base image. The backing file will not be modified unless you use the commit command. No size needs to be specified in this case.
Info
The info parameter displays information about a disk image filename. The format for the info option is as follows:
# qemu-img info [-f format] filename
This command is often used to discover the size reserved on disk which can be different from the displayed size. If snapshots are stored in the disk image, they are displayed also.
Rebase
Changes the backing file of an image.
# qemu-img rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename
The backing file is changed to backing_file and (if the format of filename supports the feature), the backing file format is changed to backing_format.

Note

Only the qcow2 format supports changing the backing file (rebase).
There are two different modes in which rebase can operate: Safe and Unsafe.
Safe mode is used by default and performs a real rebase operation. The new backing file may differ from the old one and the qemu-img rebase command will take care of keeping the guest-visible content of filename unchanged. In order to achieve this, any clusters that differ between backing_file and old backing file of filename are merged into filename before making any changes to the backing file.
Note that safe mode is an expensive operation, comparable to converting an image. The old backing file is required for it to complete successfully.
Unsafe mode is used if the -u option is passed to qemu-img rebase. In this mode, only the backing file name and format of filename is changed, without any checks taking place on the file contents. Make sure the new backing file is specified correctly or the guest-visible content of the image will be corrupted.
This mode is useful for renaming or moving the backing file. It can be used without an accessible old backing file. For instance, it can be used to fix an image whose backing file has already been moved or renamed.
Resize
Change the disk image filename as if it had been created with size size. Only images in raw format can be resized regardless of version. Red Hat Enterprise Linux 6.1 and later adds the ability to grow (but not shrink) images in qcow2 format.
Use the following to set the size of the disk image filename to size bytes:
# qemu-img resize filename size
You can also resize relative to the current size of the disk image. To give a size relative to the current size, prefix the number of bytes with + to grow, or - to reduce the size of the disk image by that number of bytes. Adding a unit suffix allows you to set the image size in kilobytes (K), megabytes (M), gigabytes (G) or terabytes (T).
# qemu-img resize filename [+|-]size[K|M|G|T]

Warning

Before using this command to shrink a disk image, you must use file system and partitioning tools inside the VM itself to reduce allocated file systems and partition sizes accordingly. Failure to do so will result in data loss.
After using this command to grow a disk image, you must use file system and partitioning tools inside the VM to actually begin using the new space on the device.
Snapshot
List, apply, create, or delete an existing snapshot (snapshot) of an image (filename).
# qemu-img snapshot [ -l | -a snapshot | -c snapshot | -d snapshot ] filename
-l lists all snapshots associated with the specified disk image. The apply option, -a, reverts the disk image (filename) to the state of a previously saved snapshot. -c creates a snapshot (snapshot) of an image (filename). -d deletes the specified snapshot.
Supported formats
qemu-img is designed to convert files to one of the following formats:
raw
Raw disk image format (default). This can be the fastest file-based format. If your file system supports holes (for example in ext2 or ext3 on Linux or NTFS on Windows), then only the written sectors will reserve space. Use qemu-img info to obtain the real size used by the image or ls -ls on Unix/Linux. Although Raw images give optimal performance, only very basic features are available with a Raw image (no snapshots etc.).
qcow2
QEMU image format, the most versatile format with the best feature set. Use it to have optional AES encryption, zlib-based compression, support of multiple VM snapshots, and smaller images, which are useful on file systems that do not support holes (non-NTFS file systems on Windows). Note that this expansive feature set comes at the cost of performance.
Although only the formats above can be used to run on a guest or host machine, qemu-img also recognizes and supports the following formats in order to convert from them into either raw or qcow2 format. The format of an image is usually detected automatically. In addition to converting these formats into raw or qcow2 , they can be converted back from raw or qcow2 to the original format.
bochs
Bochs disk image format.
cloop
Linux Compressed Loop image, useful only to reuse directly compressed CD-ROM images present for example in the Knoppix CD-ROMs.
cow
User Mode Linux Copy On Write image format. The cow format is included only for compatibility with previous versions. It does not work with Windows.
dmg
Mac disk image format.
nbd
Network block device.
parallels
Parallels virtualization disk image format.
qcow
Old QEMU image format. Only included for compatibility with older versions.
vdi
Oracle VM VirtualBox hard disk image format.
vmdk
VMware 3 and 4 compatible image format.
vpc
Windows Virtual PC disk image format. Also referred to as vhd, or Microsoft virtual hard disk image format.
vvfat
Virtual VFAT disk image format.

9.4. Verifying virtualization extensions

Use this section to determine whether your system has the hardware virtualization extensions. Virtualization extensions (Intel VT-x or AMD-V) are required for full virtualization.
  1. Run the following command to verify the CPU virtualization extensions are available:
    $ grep -E 'svm|vmx' /proc/cpuinfo
  2. Analyze the output.
    • The following output contains a vmx entry indicating an Intel processor with the Intel VT-x extension:
      flags   : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht  tm syscall lm constant_tsc pni monitor ds_cplvmx est tm2 cx16 xtpr lahf_lm
    • The following output contains an svm entry indicating an AMD processor with the AMD-V extensions:
      flags   :  fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflushmmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16lahf_lm cmp_legacy svm cr8legacy ts fid vid ttp tm stc
    If any output is received, the processor has the hardware virtualization extensions. However in some circumstances manufacturers disable the virtualization extensions in BIOS.
    The "flags:" output content may appear multiple times, once for each hyperthread, core or CPU on the system.
    The virtualization extensions may be disabled in the BIOS. If the extensions do not appear or full virtualization does not work refer to Procedure 19.1, "Enabling virtualization extensions in BIOS".
  3. Ensure KVM subsystem is loaded

    As an additional check, verify that the kvm modules are loaded in the kernel:
    # lsmod | grep kvm
    If the output includes kvm_intel or kvm_amd then the kvm hardware virtualization modules are loaded and your system meets requirements.

Note

If the libvirt package is installed, the virsh command can output a full list of virtualization system capabilities. Run virsh capabilities as root to receive the complete list.

9.5. Setting KVM processor affinities

Note

libvirt refers to a NUMA node as a cell.
This section covers setting processor and processing core affinities with libvirt and KVM guests.
By default, libvirt provisions guests using the hypervisor's default policy. For most hypervisors, the policy is to run guests on any available processing core or CPU. There are times when an explicit policy may be better, particularly for systems with a NUMA (Non-Uniform Memory Access) architecture. A guest on a NUMA system can be pinned to a processing core so that its memory allocations are always local to the node it is running on. This avoids cross-node memory transports which have less bandwidth and can significantly degrade performance.
On non-NUMA systems some form of explicit placement across the hosts� sockets, cores and hyperthreads may be more efficient.
Identifying CPU and NUMA topology
The first step in deciding which policy to apply is to determine the host�s memory and CPU topology. The virsh nodeinfo command provides information about how many sockets, cores and hyperthreads are attached to a host.
# virsh nodeinfoCPU model:   x86_64CPU(s):  8CPU frequency:   1000 MHzCPU socket(s):   2Core(s) per socket:  4Thread(s) per core:  1NUMA cell(s): 2Memory size: 8179176 kB
This output shows that the system has eight CPU cores and two sockets. Each CPU socket has four cores. This splitting of CPU cores across multiple sockets suggests that the system has Non-Uniform Memory Access (NUMA) architecture.
NUMA architecture can be more complex than other architectures. Use the virsh capabilities command to get additional output data about the CPU configuration.
# virsh capabilities<capabilities>  <host> <cpu>  <arch>x86_64</arch> </cpu> <migration_features>  <live/>  <uri_transports> <uri_transport>tcp</uri_transport>  </uri_transports> </migration_features> <topology>  <cells num='2'> <cell id='0'>  <cpus num='4'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/>  </cpus> </cell> <cell id='1'>  <cpus num='4'> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/>  </cpus> </cell>  </cells> </topology> <secmodel>  <model>selinux</model>  <doi>0</doi> </secmodel>  </host> [ Additional XML removed ]</capabilities>
This output shows two NUMA nodes (also know as NUMA cells), each containing four logical CPUs (four processing cores). This system has two sockets, therefore it can be inferred that each socket is a separate NUMA node. For a guest with four virtual CPUs, it is optimal to lock the guest to physical CPUs 0 to 3, or 4 to 7, to avoid accessing non-local memory, which is significantly slower than accessing local memory.
If a guest requires eight virtual CPUs, you could run two sets of four virtual CPU guests and split the work between them, since each NUMA node only has four physical CPUs. Running across multiple NUMA nodes significantly degrades performance for physical and virtualized tasks.
Decide which NUMA node can run the guest
Locking a guest to a particular NUMA node offers no benefit if that node does not have sufficient free memory for that guest. libvirt stores information on the free memory available on each node. Use the virsh freecell --all command to display the free memory on all NUMA nodes.
# virsh freecell --all0: 2203620 kB1: 3354784 kB
If a guest requires 3 GB of RAM allocated, then the guest should be run on NUMA node (cell) 1. Node 0 only has 2.2GB free which may not be sufficient for certain guests.
Lock a guest to a NUMA node or physical CPU set
Once you have determined which node to run the guest on, refer to the capabilities data (the output of the virsh capabilities command) about NUMA topology.
  1. Extract from the virsh capabilities output.
    <topology>  <cells num='2'> <cell id='0'> <cpus num='4'>  <cpu id='0'/>  <cpu id='1'/>  <cpu id='2'/>  <cpu id='3'/> </cpus>  </cell>  <cell id='1'> <cpus num='4'>  <cpu id='4'/>  <cpu id='5'/>  <cpu id='6'/>  <cpu id='7'/> </cpus>  </cell>  </cells></topology>
  2. Observe that the node 1, <cell id='1'>, uses physical CPUs 4 to 7.
  3. The guest can be locked to a set of CPUs by appending the cpuset attribute to the configuration file.
    1. While the guest is offline, open the configuration file with virsh edit.
    2. Locate the guest's virtual CPU count, defined in the vcpus element.
      <vcpus>4</vcpus>
      The guest in this example has four CPUs.
    3. Add a cpuset attribute with the CPU numbers for the relevant NUMA cell.
      <vcpus cpuset='4-7'>4</vcpus>
  4. Save the configuration file and restart the guest.
The guest has been locked to CPUs 4 to 7.
Automatically locking guests to CPUs with virt-install
The virt-install provisioning tool provides a simple way to automatically apply a 'best fit' NUMA policy when guests are created.
The cpuset option for virt-install can use a CPU set of processors or the parameter auto. The auto parameter automatically determines the optimal CPU locking using the available NUMA data.
For a NUMA system, use the --cpuset=auto with the virt-install command when creating new guests.
Tuning CPU affinity on running guests
There may be times where modifying CPU affinities on running guests is preferable to rebooting the guest. The virsh vcpuinfo and virsh vcpupin commands can perform CPU affinity changes on running guests.
The virsh vcpuinfo command gives up to date information about where each virtual CPU is running.
In this example, guest1 is a guest with four virtual CPUs is running on a KVM host.
# virsh vcpuinfo guest1VCPU:   0CPU: 3State:  runningCPU time:   0.5sCPU Affinity:   yyyyyyyyVCPU:   1CPU: 1State:  runningCPU Affinity:   yyyyyyyyVCPU:   2CPU: 1State:  runningCPU Affinity:   yyyyyyyyVCPU:   3CPU: 2State:  runningCPU Affinity:   yyyyyyyy
The virsh vcpuinfo output (the yyyyyyyy value of CPU Affinity) shows that the guest can presently run on any CPU.
To lock the virtual CPUs to the second NUMA node (CPUs four to seven), run the following commands.
# virsh vcpupin guest1 0 4# virsh vcpupin guest1 1 5# virsh vcpupin guest1 2 6# virsh vcpupin guest1 3 7
The virsh vcpuinfo command confirms the change in affinity.
# virsh vcpuinfo guest1VCPU:   0CPU: 4State:  runningCPU time:   32.2sCPU Affinity:   ----y---VCPU:   1CPU: 5State:  runningCPU time:   16.9sCPU Affinity:   -----y--VCPU:   2CPU: 6State:  runningCPU time:   11.9sCPU Affinity:   ------y-VCPU:   3CPU: 7State:  runningCPU time:   14.6sCPU Affinity:   -------y

9.6. Generating a new unique MAC address

In some cases you will need to generate a new and unique MAC address for a guest. There is no command line tool available to generate a new MAC address at the time of writing. The script provided below can generate a new MAC address for your guests. Save the script to your guest as macgen.py. Now from that directory you can run the script using ./macgen.py and it will generate a new MAC address. A sample output would look like the following:
$ ./macgen.py 00:16:3e:20:b0:11
#!/usr/bin/python# macgen.py script to generate a MAC address for guests#import random#def randomMAC():mac = [ 0x00, 0x16, 0x3e,random.randint(0x00, 0x7f),random.randint(0x00, 0xff),random.randint(0x00, 0xff) ]return ':'.join(map(lambda x: "%02x" % x, mac))#print randomMAC()
Another method to generate a new MAC for your guest
You can also use the built-in modules of python-virtinst to generate a new MAC address and UUID for use in a guest configuration file:
# echo  'import virtinst.util ; print\ virtinst.util.uuidToString(virtinst.util.randomUUID())' | python# echo  'import virtinst.util ; print virtinst.util.randomMAC()' | python
The script above can also be implemented as a script file as seen below.
#!/usr/bin/env python#  -*- mode: python; -*-print ""print "New UUID:"import virtinst.util ; print virtinst.util.uuidToString(virtinst.util.randomUUID())print "New MAC:"import virtinst.util ; print virtinst.util.randomMAC()print ""

9.7. Improving guest response time

Guests can sometimes be slow to respond with certain workloads and usage patterns. Examples of situations which may cause slow or unresponsive guests:
  • Severely overcommitted memory.
  • Overcommitted memory with high processor usage
  • Other (not qemu-kvm processes) busy or stalled processes on the host.
These types of workload may cause guests to appear slow or unresponsive. Usually, the guest's memory is eventually fully loaded into the host's main memory from swap. Once the guest is loaded in main memory, the guest will perform normally. Note, the process of loading a guest from swap to main memory may take several seconds per gigabyte of RAM assigned to the guest, depending on the type of storage used for swap and the performance of the components.
KVM guests function as Linux processes. Linux processes are not permanently kept in main memory (physical RAM). The kernel scheduler swaps process memory into virtual memory (swap). Swap, with conventional hard disk drives, is thousands of times slower than main memory in modern computers. If a guest is inactive for long periods of time, the guest may be placed into swap by the kernel.
KVM guests processes may be moved to swap regardless of whether memory is overcommitted or overall memory usage.
Using unsafe overcommit levels or overcommitting with swap turned off guest processes or other critical processesis not recommended. Always ensure the host has sufficient swap space when overcommitting memory.
For more information on overcommitting with KVM, refer to Section 6.1, "Introduction".

Warning

Virtual memory allows a Linux system to use more memory than there is physical RAM on the system. Underused processes are swapped out which allows active processes to use memory, improving memory utilization. Disabling swap reduces memory utilization as all processes are stored in physical RAM.
If swap is turned off, do not overcommit guests. Overcommitting guests without any swap can cause guests or the host system to crash.
Turning off swap
Swap usage can be completely turned off to prevent guests from being unresponsive while they are moved back to main memory. Swap may also not be desired for guests as it can be resource-intensive on some systems.
The swapoff command can disable all swap partitions and swap files on a system.
# swapoff -a
To make this change permanent, remove swap lines from the /etc/fstab file and restart the host system.
Using SSDs for swap
Using Solid State Drives (SSDs) for swap storage may improve the performance of guests.
Using RAID arrays, faster disks or separate drives dedicated to swap may also improve performance.

9.8. Disable SMART disk monitoring for guests

SMART disk monitoring can be safely disabled as virtual disks and the physical storage devices are managed by the host.
# service smartd stop# chkconfig --del smartd

9.9. Configuring a VNC Server

To configure a VNC server, use the Remote Desktop application in System > Preferences. Alternatively, you can run the vino-preferences command.
Use the following step set up a dedicated VNC server session:
If needed, Create and then Edit the ~/.vnc/xstartup file to start a GNOME session whenever vncserver is started. The first time you run the vncserver script it will ask you for a password you want to use for your VNC session. For more information on vnc server files refer to the Red Hat Enterprise Linux Installation Guide.

9.10. Gracefully shutting down guests

Installing virtualized Red Hat Enterprise Linux 6 guests with the Minimal installation option will not install the acpid package.
Without the acpid package, the Red Hat Enterprise Linux 6 guest does not shut down when the virsh shutdown command is executed. The virsh shutdown command is designed to gracefully shut down guests.
Using virsh shutdown is easier and safer for system administration. Without graceful shut down with the virsh shutdown command a system administrator must log into a guest manually or send the Ctrl-Alt-Del key combination to each guest.

Note

Other virtualized operating systems may be affected by this issue. The virsh shutdown command requires that the guest operating system is configured to handle ACPI shut down requests. Many operating systems require additional configuration on the guest operating system to accept ACPI shut down requests.

Procedure 9.1. Workaround for Red Hat Enterprise Linux 6

  1. Install the acpid package

    The acpid service listen and processes ACPI requests.
    Log into the guest and install the acpid package on the guest:
    # yum install acpid
  2. Enable the acpid service

    Set the acpid service to start during the guest boot sequence and start the service:
    # chkconfig acpid on# service acpid start
The guest is now configured to shut down when the virsh shutdown command is used.

9.11. Virtual machine timer management with libvirt

Accurate time keeping on guests is a key challenge for virtualization platforms. Different hypervisors attempt to handle the problem of time keeping in a variety of ways. Libvirt provides hypervisor independent configuration settings for time management, using the <clock> and <timer> elements in the domain XML. The domain XML can be edited using the virsh edit command. See Editing a guest's configuration file for details.
<clock>
The clock element is used to determine how the guest clock is synchronized with the host clock. The clock element has the following attributes:
  • offset
    Determines how the guest clock is offset from the host clock. The offset attribute has the following possible values:

    Table 9.1. Offset attribute values

    ValueDescription
    utcThe guest clock will be synchronized to UTC when booted.
    localtimeThe guest clock will be synchronized to the host's configured timezone when booted, if any.
    timezoneThe guest clock will be synchronized to a given timezone, specified by the timezone attribute.
    variableThe guest clock will be synchronized to an arbitrary offset from UTC. The delta relative to UTC is specified in seconds, using the adjustment attribute. The guest is free to adjust the Real Time Clock (RTC) over time and expect that it will be honored following the next reboot. This is in contrast to utc mode, where any RTC adjustments are lost at each reboot.

    Note

    The value utc is set as the clock offset in a virtual machine by default. However, if the guest clock is run with the localtime value, the clock offset needs to be changed to a different value in order to have the guest clock synchronized with the host clock.
  • timezone
    The timezone to which the guest clock is to be synchronized.
  • adjustment
    The delta for guest clock synchronization. In seconds, relative to UTC.

Example 9.1. Always synchronize to UTC

<clock offset="utc" />

Example 9.2. Always synchronize to the host timezone

<clock offset="localtime" />

Example 9.3. Synchronize to an arbitrary timezone

<clock offset="timezone" timezone="Europe/Paris" />

Example 9.4. Synchronize to UTC + arbitrary offset

<clock offset="variable" adjustment="123456" />

<timer>
A clock element can have zero or more timer elements as children. The timer element specifies a time source used for guest clock synchronization. The timer element has the following attributes. Only the name is required, all other attributes are optional.
  • name
    The name of the time source to use.

    Table 9.2. name attribute values

    ValueDescription
    platformThe master virtual time source which may be used to drive the policy of other time sources.
    pitProgrammable Interval Timer - a timer with periodic interrupts.
    rtcReal Time Clock - a continuously running timer with periodic interrupts.
    hpetHigh Precision Event Timer - multiple timers with periodic interrupts.
    tscTime Stamp Counter - counts the number of ticks since reset, no interrupts.
    kvmclockKVM clock - recommended clock source for KVM guests. KVM pvclock, or kvm-clock lets guests read the host�s wall clock time.

  • track
    The track attribute specifies what is tracked by the timer. Only valid for a name value of platform or rtc.

    Table 9.3. track attribute values

    ValueDescription
    bootCorresponds to old host option, this is an unsupported tracking option.
    guestRTC always tracks guest time.
    wallRTC always tracks host time.

  • tickpolicy
    The policy used to pass ticks on to the guest.

    Table 9.4. tickpolicy attribute values

    ValueDescription
    delayContinue to deliver at normal rate (i.e. ticks are delayed).
    catchupDeliver at a higher rate to catch up.
    mergeTicks merged into one single tick.
    discardAll missed ticks are discarded.

  • frequency
    Used to set a fixed frequency, measured in Hz. This attribute is only relevant for a name value of tsc. All other timers operate at a fixed frequency (pit, rtc), or at a frequency fully controlled by the guest (hpet).
  • mode
    Determines how the time source is exposed to the guest. This attribute is only relevant for a name value of tsc. All other timers are always emulated. Command is as follows: <timer name='tsc' frequency='NNN' mode='auto|native|emulate|smpsafe'/>. Mode definitions are given in the table.

    Table 9.5. mode attribute values

    ValueDescription
    autoNative if TSC is unstable, otherwise allow native TSC access.
    nativeAlways allow native TSC access.
    emulateAlways emulate TSC.
    smpsafeAlways emulate TSC and interlock SMP

  • present
    Used to override the default set of timers visible to the guest. For example, to enable or disable the HPET.

    Table 9.6. present attribute values

    ValueDescription
    yesForce this timer to the visible to the guest.
    noForce this timer to not be visible to the guest.

Example 9.5. Clock synchronizing to local time with RTC and PIT timers, and the HPET timer disabled

<clock offset="localtime"><timer name="rtc" tickpolicy="catchup" track="guest" /><timer name="pit" tickpolicy="delay" /><timer name="hpet" present="no" /></clock>

9.12. Using PMU to monitor guest performance

In Red Hat Enterprise Linux 6.4, vPMU (virtual PMU )was introduced as technical-preview. vPMU is based on Intel's PMU (Performance Monitoring Units) and may only be used on Intel machines. PMU allows the tracking of statistics which indicate how a guest virtual machine is functioning.
Using performance monitoring, allows developers to use the CPU's PMU counter while using the performance tool for profiling. The virtual performance monitoring unit feature allows virtual machine users to identify sources of possible performance problems in their guest virtual machines, thereby improving the ability to profile a KVM guest virtual machine.
To enable the feature, the -cpu host flag must be set.
This feature is only supported with guests running Red Hat Enterprise Linux 6 and is disabled by default. This feature only works using the Linux perf tool. Make sure the perf package is installed using the command:
# yum install perf.
See the man page on perf for more information on the perf commands.

9.13. Guest virtual machine power management

It is possible to forcibly enable or disable BIOS advertisements to the guest virtual machine's operating system by changing the following parameters in the Domain XML for Libvirt:
...  <pm> <suspend-to-disk enabled='no'/> <suspend-to-mem enabled='yes'/>  </pm>  ...
The element pm enables ('yes') or disables ('no') BIOS support for S3 (suspend-to-disk) and S4 (suspend-to-mem) ACPI sleep states. If nothing is specified, then the hypervisor will be left with its default value.

9.14. QEMU Guest Agent Protocol

The QEMU guest agent protocol (QEMU-ga), uses the same protocol as QMP. qemu-ga is provided as technical-preview for Red Hat Enterprise Linux 6.4. There are a couple issues regarding its isa-serial/virtio-serial transport, and the following caveats have been noted:
  • There is no way for qemu-ga to detect whether or not a client has connected to the channel.
  • There is no way for a client to detect whether or not qemu-ga has disconnected or reconnected to the backend.
  • If the virtio-serial device resets and qemu-ga has not connected to the channel as a result, (generally caused by a reboot or hotplug), data from the client will be dropped.
  • If qemu-ga has connected to the channel following a virtio-serial device reset, data from the client will be queued (and eventually throttled if available buffers are exhausted), regardless of whether or not qemu-ga is still running/connected.
qemu-ga uses the guest-sync or guest-sync-delimited command to address the problem of re-synchronizing the channel after re-connection or client-side timeouts. These are described below.

9.14.1. guest-sync

The guest-sync request/response exchange is simple. The client provides a unique numerical token, the agent sends it back in a response:
   > { "execute": "guest-sync", "arguments": { "id": 123456 } }   < { "return": 123456}
A successful exchange guarantees that the channel is now in sync and no unexpected data/responses will be sent. Note that for the reasons mentioned above there's no guarantee this request will be answered, so a client should implement a timeout and re-issue this periodically until a response is received for the most recent request.
This alone does not handle synchronization issues in all cases. For example, if qemu-ga's parser previously received a partial request from a previous client connection, subsequent attempts to issue the guest-sync request can be misconstrued as being part of the previous partial request. Eventually qemu-ga will hit it's recursion or token size limit and flush its parser state, at which point it will begin processing the backlog of requests, but there's no guarantee this will occur before the channel is throttled due to exhausting all available buffers. Thus, there is a potential for a deadlock situation occurring for certain instances.
To avoid this, qemu-ga/QEMU's JSON parser has special handling for the 0xFF byte, which is an invalid UTF-8 character. Client requests should precede the guest-sync request with to ensure that qemu-ga flushes it's parser state as soon as possible. As long as all clients abide by this, the deadlock state should be reliably avoidable.
For more information see the qemu-ga wiki page on wiki.qemu.org.

9.14.2. guest-sync-delimited

If qemu-ga attempts to communicate with a client, and the client receives a partial response from a previous qemu-ga instance, the client might misconstrue responses to guest-sync as being part of this previous request. For client implementations that treat newlines as a delimiter for qemu-ga responses, use guest-synch-delimited.
Even in some cases where there are JSON stream-based implementations that do not rely on newline delimiters, it may be considered invasive to implement a client's response/JSON handling, as it is the same deadlock scenario described previously. Using the guest-sync-delimited on the client, tells qemu-ga to place the same 0xFF character in front of the response, thereby preventing confusion.
> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }< { "return": 123456}
Actual hex values sent:
> 7b 27 65 78 65 63 75 74 65 27 3a 27 67 75 65 73 74 2d 73 79 6e 63 2d 64 65  6c 69 6d 69 74 65 64 27 2c 27 61 72 67 75 6d 65 6e 74 73 27 3a 7b 27 69 64  27 3a 31 32 33 34 35 36 7d 7d 0a< ff 7b 22 72 65 74 75 72 6e 22 3a 20 31 32 33 34 35 36 7d 0a
As stated above, the request should also be preceded with a 0xFF to flush qemu-ga's parser state.

9.15. Setting a limit on device redirection

To filter out certain devices from redirection, pass the filter property to -device usb-redir. The filter property takes a string consisting of filter rules, the format for a rule is:
<class>:<vendor>:<product>:<version>:<allow>
Use the value -1 to designate it to accept any value for a particular field. You may use multiple rules on the same command line using | as a separator. Note that if a device matches none of the passed in rules, redirecting it will not be allowed!

Example 9.6. An example of limiting redirection with a windows guest virtual machine

  1. Prepare a Windows 7 guest.
  2. Add the following code excerpt to the guest's' domain xml file:
     <redirdev bus='usb' type='spicevmc'>  <alias name='redir0'/>  <address type='usb' bus='0' port='3'/> </redirdev> <redirfilter>  <usbdev class='0x08' vendor='0x1234' product='0xBEEF' version='2.0' allow='yes'/>  <usbdev class='-1' vendor='-1' product='-1' version='-1' allow='no'/> </redirfilter>
  3. Start the guest and confirm the setting changes by running the following:
    #ps -ef | grep $guest_name
    -device usb-redir,chardev=charredir0,id=redir0,/filter=0x08:0x1234:0xBEEF:0x0200:1|-1:-1:-1:-1:0,bus=usb.0,port=3
  4. Plug a USB device into host, and use virt-viewer to connect to the guest.
  5. Click USB device selection in the menu, which will produce the following message: "Some USB devices are blocked by host policy". Click OK to confirm and continue.
    The filter takes effect.
  6. To make sure that the filter captures properly check the USB device vendor and product, then make the following changes in the host's domain XML to allow for USB redirection.
       <redirfilter>  <usbdev class='0x08' vendor='0x0951' product='0x1625' version='2.0' allow='yes'/>  <usbdev allow='no'/> </redirfilter>
  7. Restart the guest, then use virt-viewer to connect to the guest. The USB device will now redirect traffic to the guest.

9.16. Dynamically changing a host or a network bridge that is attached to a virtual NIC

This section demonstrates how to move the vNIC of a guest from one bridge to another while the guest is running without compromising the guest
  1. Prepare guest with a configuration similar to the following:
    <interface type='bridge'>  <mac address='52:54:00:4a:c9:5e'/>  <source bridge='virbr0'/>  <model type='virtio'/></interface>
  2. Prepare an XML file for interface update:
    # cat br1.xml
    <interface type='bridge'>  <mac address='52:54:00:4a:c9:5e'/>  <source bridge='virbr1'/>  <model type='virtio'/></interface>
  3. Start the guest, confirm the guest's network functionality, and check that the guest's vnetX is connected to the bridge you indicated.
    # brctl showbridge name bridge id   STP enabled interfacesvirbr0  8000.5254007da9f2   yes  virbr0-nicvnet0virbr1  8000.525400682996   yes  virbr1-nic
  4. Update the guest's network with the new interface parameters with the following command:
    # virsh update-device test1 br1.xml Device updated successfully
  5. On the guest, run service network restart. The guest gets a new IP address for virbr1. Check the guest's vnet0 is connected to the new bridge(virbr1)
    # brctl showbridge name bridge id   STP enabled interfacesvirbr0  8000.5254007da9f2   yes virbr0-nicvirbr1  8000.525400682996   yes virbr1-nic vnet0

Chapter 10. Storage concepts

This chapter introduces the concepts used for describing and managing storage devices. Terms such as Storage Pools and Volumes are explained in the sections that follow.

10.1. Storage pools

A storage pool is a file, directory, or storage device managed by libvirt for the purpose of providing storage to guests. The storage pool can be local or it can be shared over a network.
libvirt uses a directory-based storage pool, the /var/lib/libvirt/images/ directory, as the default storage pool. The default storage pool can be changed to another storage pool.
  • Local storage pools - Local storage pools are directly attached to the host server. Local storage pools include: local directories, directly attached disks, physical partitions, and LVM volume groups. These storage volumes store guest images or are attached to guests as additional storage. As local storage pools are directly attached to the host server, they are useful for development, testing and small deployments that do not require migration or large numbers of guests. Local storage pools are not suitable for many production environments as local storage pools do not support live migration.
  • Networked (shared) storage pools - Networked storage pools include storage devices shared over a network using standard protocols. Networked storage is required when migrating virtual machines between hosts with virt-manager, but is optional when migrating with virsh. Networked storage pools are managed by libvirt. Supported protocols for networked storage pools include:
    • Fibre Channel-based LUNs
    • iSCSI
    • NFS
    • GFS2
    • SCSI RDMA protocols (SCSI RCP), the block export protocol used in InfiniBand and 10GbE iWARP adapters.

10.2.  Volumes

Storage pools are divided into storage volumes. Storage volumes are an abstraction of physical partitions, LVM logical volumes, file-based disk images and other storage types handled by libvirt. Storage volumes are presented to guests as local storage devices regardless of the underlying hardware.
Referencing volumes
To reference a specific volume, three approaches are possible:
The name of the volume and the storage pool
A volume may be referred to by name, along with an identifier for the storage pool it belongs in. On the virsh command line, this takes the form --pool storage_pool volume_name.
For example, a volume named firstimage in the guest_images pool.
# virsh vol-info --pool guest_images firstimageName:   firstimageType:   blockCapacity:   20.00 GBAllocation: 20.00 GBvirsh #
The full path to the storage on the host system
A volume may also be referred to by its full path on the file system. When using this approach, a pool identifier does not need to be included.
For example, a volume named secondimage.img, visible to the host system as /images/secondimage.img. The image can be referred to as /images/secondimage.img.
# virsh vol-info /images/secondimage.imgName:   secondimage.imgType:   fileCapacity:   20.00 GBAllocation: 136.00 kB
The unique volume key
When a volume is first created in the virtualization system, a unique identifier is generated and assigned to it. The unique identifier is termed the volume key. The format of this volume key varies upon the storage used.
When used with block based storage such as LVM, the volume key may follow this format:
c3pKz4-qPVc-Xf7M-7WNM-WJc8-qSiz-mtvpGn
When used with file based storage, the volume key may instead be a copy of the full path to the volume storage.
/images/secondimage.img
For example, a volume with the volume key of Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr:
# virsh vol-info Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUrName:   firstimageType:   blockCapacity:   20.00 GBAllocation: 20.00 GB
virsh provides commands for converting between a volume name, volume path, or volume key:
vol-name
Returns the volume name when provided with a volume path or volume key.
# virsh vol-name /dev/guest_images/firstimagefirstimage# virsh vol-name Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
vol-path
Returns the volume path when provided with a volume key, or a storage pool identifier and volume name.
# virsh vol-path Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr/dev/guest_images/firstimage# virsh vol-path --pool guest_images firstimage/dev/guest_images/firstimage
The vol-key command
Returns the volume key when provided with a volume path, or a storage pool identifier and volume name.
# virsh vol-key /dev/guest_images/firstimageWlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr# virsh vol-key --pool guest_images firstimage Wlvnf7-a4a3-Tlje-lJDa-9eak-PZBv-LoZuUr
(Sebelumnya) 24 : Chapter 5. Remote managem ...24 : Chapter 11. Storage pools (Berikutnya)