QEMU and Block Devices

The most frequent Ceph Block Device use case involves providing block deviceimages to virtual machines. For example, a user may create a “golden” imagewith an OS and any relevant software in an ideal configuration. Then, the usertakes a snapshot of the image. Finally, the user clones the snapshot (usuallymany times). See Snapshots for details. The ability to make copy-on-writeclones of a snapshot means that Ceph can provision block device images tovirtual machines quickly, because the client doesn’t have to download an entireimage each time it spins up a new virtual machine.

QEMU - 图1

Ceph Block Devices can integrate with the QEMU virtual machine. For details onQEMU, see QEMU Open Source Processor Emulator. For QEMU documentation, seeQEMU Manual. For installation details, see Installation.

Important

To use Ceph Block Devices with QEMU, you must have access to arunning Ceph cluster.

Usage

The QEMU command line expects you to specify the pool name and image name. Youmay also specify a snapshot name.

QEMU will assume that the Ceph configuration file resides in the defaultlocation (e.g., /etc/ceph/$cluster.conf) and that you are executingcommands as the default client.admin user unless you expressly specifyanother Ceph configuration file path or another user. When specifying a user,QEMU uses the ID rather than the full TYPE:ID. See User Management -User for details. Do not prepend the client type (i.e., client.) to thebeginning of the user ID, or you will receive an authentication error. Youshould have the key for the admin user or the key of another user youspecify with the :id={user} option in a keyring file stored in default path(i.e., /etc/ceph or the local directory with appropriate file ownership andpermissions. Usage takes the following form:

  1. qemu-img {command} [options] rbd:{pool-name}/{image-name}[@snapshot-name][:option1=value1][:option2=value2...]

For example, specifying the id and conf options might look like the following:

  1. qemu-img {command} [options] rbd:glance-pool/maipo:id=glance:conf=/etc/ceph/ceph.conf

Tip

Configuration values containing :, @, or = can be escaped with aleading \ character.

Creating Images with QEMU

You can create a block device image from QEMU. You must specify rbd, thepool name, and the name of the image you wish to create. You must also specifythe size of the image.

  1. qemu-img create -f raw rbd:{pool-name}/{image-name} {size}

For example:

  1. qemu-img create -f raw rbd:data/foo 10G

Important

The raw data format is really the only sensibleformat option to use with RBD. Technically, you could use otherQEMU-supported formats (such as qcow2 or vmdk), but doingso would add additional overhead, and would also render the volumeunsafe for virtual machine live migration when caching (see below)is enabled.

Resizing Images with QEMU

You can resize a block device image from QEMU. You must specify rbd,the pool name, and the name of the image you wish to resize. You must alsospecify the size of the image.

  1. qemu-img resize rbd:{pool-name}/{image-name} {size}

For example:

  1. qemu-img resize rbd:data/foo 10G

Retrieving Image Info with QEMU

You can retrieve block device image information from QEMU. You mustspecify rbd, the pool name, and the name of the image.

  1. qemu-img info rbd:{pool-name}/{image-name}

For example:

  1. qemu-img info rbd:data/foo

Running QEMU with RBD

QEMU can pass a block device from the host on to a guest, but sinceQEMU 0.15, there’s no need to map an image as a block device onthe host. Instead, QEMU can access an image as a virtual blockdevice directly via librbd. This performs better because it avoidsan additional context switch, and can take advantage of RBD caching.

You can use qemu-img to convert existing virtual machine images to Cephblock device images. For example, if you have a qcow2 image, you could run:

  1. qemu-img convert -f qcow2 -O raw debian_squeeze.qcow2 rbd:data/squeeze

To run a virtual machine booting from that image, you could run:

  1. qemu -m 1024 -drive format=raw,file=rbd:data/squeeze

RBD caching can significantly improve performance.Since QEMU 1.2, QEMU’s cache options control librbd caching:

  1. qemu -m 1024 -drive format=rbd,file=rbd:data/squeeze,cache=writeback

If you have an older version of QEMU, you can set the librbd cacheconfiguration (like any Ceph configuration option) as part of the‘file’ parameter:

  1. qemu -m 1024 -drive format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback

Important

If you set rbd_cache=true, you must set cache=writebackor risk data loss. Without cache=writeback, QEMU will not sendflush requests to librbd. If QEMU exits uncleanly in thisconfiguration, file systems on top of rbd can be corrupted.

Enabling Discard/TRIM

Since Ceph version 0.46 and QEMU version 1.1, Ceph Block Devices support thediscard operation. This means that a guest can send TRIM requests to let a Cephblock device reclaim unused space. This can be enabled in the guest by mountingext4 or XFS with the discard option.

For this to be available to the guest, it must be explicitly enabledfor the block device. To do this, you must specify adiscard_granularity associated with the drive:

  1. qemu -m 1024 -drive format=raw,file=rbd:data/squeeze,id=drive1,if=none \
  2. -device driver=ide-hd,drive=drive1,discard_granularity=512

Note that this uses the IDE driver. The virtio driver does notsupport discard.

If using libvirt, edit your libvirt domain’s configuration file using virshedit to include the xmlns:qemu value. Then, add a qemu:commandlineblock as a child of that domain. The following example shows how to set twodevices with qemu id= to different discard_granularity values.

  1. <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  2. <qemu:commandline>
  3. <qemu:arg value='-set'/>
  4. <qemu:arg value='block.scsi0-0-0.discard_granularity=4096'/>
  5. <qemu:arg value='-set'/>
  6. <qemu:arg value='block.scsi0-0-1.discard_granularity=65536'/>
  7. </qemu:commandline>
  8. </domain>

QEMU Cache Options

QEMU’s cache options correspond to the following Ceph RBD Cache settings.

Writeback:

  1. rbd_cache = true

Writethrough:

  1. rbd_cache = true
  2. rbd_cache_max_dirty = 0

None:

  1. rbd_cache = false

QEMU’s cache settings override Ceph’s cache settings (including settings thatare explicitly set in the Ceph configuration file).

Note

Prior to QEMU v2.4.0, if you explicitly set RBD Cache settingsin the Ceph configuration file, your Ceph settings override the QEMU cachesettings.