Using Direct I/O

The operating system normally caches file I/O operations in memory because memory access is much faster than disk access. The application writes to a block of memory that is later flushed to the storage device, which is usually a RAID controller in a Greenplum Database system. Whenever the application accesses a block that is still resident in memory, a device access is avoided. Direct I/O allows you to bypass the cache so that the application writes directly to the storage device. This reduces CPU consumption and eliminates a data copy operation. Direct I/O is efficient for operations like backups where file blocks are only handled once.

Note: Direct I/O is supported only on Red Hat, CentOS, and SUSE.

Turning on Direct I/O

Set the gp_backup_directIO system configuration parameter on to enable direct I/O for backups:

  1. $ gpconfig -c gp_backup_directIO -v on

To see if direct I/O is enabled, use this command:

  1. $ gpconfig -s gp_backup_directIO

Decrease network data chunks sent to dump when the database is busy

The gp_backup_directIO_read_chunk_mb configuration parameter sets the size, in MB, of the I/O chunk when direct I/O is enabled. The default chunk size, 20MB, has been tested and found to be optimal. Decreasing it increases the backup time and increasing it results in little change to backup time.

To find the current direct I/O chunk size, enter this command:

  1. $ gpconfig -s gp_backup_directIO_read_chunk_mb

The following example changes the default chunk size to 10MB.

  1. $ gpconfig -c gp_backup_directIO_read_chunk_mb -v 10

Parent topic: Parallel Backup with gpcrondump and gpdbrestore