NOTE: This topic’s examples are for a CentOS 7 platform. For a full list of supported platforms, see the Supported Platforms topic.

With this option you create a base virtual machine from an existing CentOS 7 virtual machine, use Terraform from the jumpbox virtual machine to generate copies of the base virtual machine which will comprise the Greenplum Database cluster, and deploy a Greenplum Database cluster.

Creating the Base Virtual Machine

In this section, you clone a virtual machine from an existing CentOS 7 virtual machine, perform a series of configuration changes, and create a base virtual machine from it. Finally, you verify that it was configured correctly.

Preparing the Virtual Machine

Create a base virtual machine from an existing virtual machine. You must have a running CentOS 7 virtual machine in the datastore and cluster where you deploy the Greenplum environment.

  1. Log in to vCenter and navigate to Hosts and Clusters.
  2. Right click your existing CentOS 7 virtual machine.
  3. Select Clone -> Clone to Virtual Machine.
  4. Enter greenplum-db-base-vm as the virtual machine name, then click Next.
  5. Select your cluster, then click Next.
  6. Select the vSAN datastore and select Keep existing VM storage policies for VM Storage Policy, then click Next.
  7. Under Select clone options, check the boxes Power on virtual machine after creation and Customize this virtual machine’s hardware and click Next.
  8. Under Customize hardware, check the number of hard disks configured for this virtual machine. If there is only one, add a second one by clicking Add new device -> Hard Disk.
  9. Edit the existing network adapter New Network so it connects to the gp-virtual-external port group.
    1. If you are using DHCP, a new IP address will be assigned to this interface. If you are using static IP assignment, you must manually set up the IP address in a later step.
  10. Review your configuration, then click Finish.
  11. Once the virtual machine is powered on, launch the Web Console and log in as root. Check the virtual machine IP address by running ip a. If you are using static IP assignment, you must manually set it up:

    1. Edit the file /etc/sysconfig/network-scripts/ifcfg-<interface-name>.
    2. Enter the network information provided by your network administrator for the gp-virtual-external network. For example:

      1. BOOTPROTO=none
      2. IPADDR=10.202.89.10
      3. NETMASK=255.255.255.0
      4. GATEWAY=10.202.89.1
      5. DNS1=1.0.0.1
      6. DNS2=1.1.1.1

Performing System Configuration

Configure the newly cloned virtual machine in order to support a Greenplum Database system.

  1. Log in to the cloned virtual machine greenplum-db-base-vm as user root.

  2. Verify that VMware Tools is installed. Refer to Installing VMware Tools for instructions.

  3. Disable the following services:

    1. Disable SELinux by editing the /etc/selinux/config file. Change the value of the SELINUX parameter in the configuration file as follows:

      1. SELINUX=disabled
    2. Check that the System Security Services Daemon (SSSD) is installed:

      1. $ yum list sssd | grep -i "Installed Packages"

      If the SSSD is installed, edit the SSSD configuration file and set the selinux_provider parameter to none to prevent SELinux related SSH authentication denials which could occur even if SELinux is disabled. Edit /etc/sssd/sssd.conf and add the following line. If SSSD is not installed, skip this step.

      1. selinux_provider=none
    3. Disable the Firewall service:

      1. $ systemctl stop firewalld
      2. $ systemctl disable firewalld
      3. $ systemctl mask --now firewalld
    4. Disable the Tuned daemon:

      1. $ systemctl stop tuned
      2. $ systemctl disable tuned
      3. $ systemctl mask --now tuned
    5. Disable Chrony:

      1. $ systemctl stop chronyd
      2. $ systemctl disable chronyd
      3. $ systemctl mask --now chronyd
  4. Back up the boot files:

    1. $ cp /etc/default/grub /etc/default/grub-backup
    2. $ cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg-backup
  5. Add the following boot parameters:

    1. Disable Transparent Huge Page (THP):

      1. $ grubby --update-kernel=ALL --args="transparent_hugepage=never"
    2. Add the parameter elevator=deadline:

      1. $ grubby --update-kernel=ALL --args="elevator=deadline"
  6. Install and enable the ntp daemon:

    1. $ yum install -y ntp
    2. $ systemctl enable ntpd
  7. Configure the NTP servers:

    1. Remove all unwanted servers from /etc/ntp.conf. For example:

      1. ...
      2. # Use public servers from the pool.ntp.org project.
      3. # Please consider joining the pool (http://www.pool.ntp.org/join.html).
      4. server 0.centos.pool.ntp.org iburst
      5. ...
    2. Add an entry for each server to /etc/ntp.conf:

      1. server <data center's NTP time server 1>
      2. server <data center's NTP time server 2>
      3. ...
      4. server <data center's NTP time server N>
    3. Add the master and standby to the list of servers after datacenter NTP servers in /etc/ntp.conf:

      1. server <data center's NTP time server N>
      2. ...
      3. server mdw
  8. Configure kernel settings so the system is optimized for Greenplum Database.

    1. Create the configuration file /etc/sysctl.d/10-gpdb.conf and paste in the following kernel optimization parameters:

      1. kernel.msgmax = 65536
      2. kernel.msgmnb = 65536
      3. kernel.msgmni = 2048
      4. kernel.sem = 500 2048000 200 40960
      5. kernel.shmmni = 1024
      6. kernel.sysrq = 1
      7. net.core.netdev_max_backlog = 2000
      8. net.core.rmem_max = 4194304
      9. net.core.wmem_max = 4194304
      10. net.core.rmem_default = 4194304
      11. net.core.wmem_default = 4194304
      12. net.ipv4.tcp_rmem = 4096 4224000 16777216
      13. net.ipv4.tcp_wmem = 4096 4224000 16777216
      14. net.core.optmem_max = 4194304
      15. net.core.somaxconn = 10000
      16. net.ipv4.ip_forward = 0
      17. net.ipv4.tcp_congestion_control = cubic
      18. net.ipv4.tcp_tw_recycle = 0
      19. net.core.default_qdisc = fq_codel
      20. net.ipv4.tcp_mtu_probing = 0
      21. net.ipv4.conf.all.arp_filter = 1
      22. net.ipv4.conf.default.accept_source_route = 0
      23. net.ipv4.ip_local_port_range = 10000 65535
      24. net.ipv4.tcp_max_syn_backlog = 4096
      25. net.ipv4.tcp_syncookies = 1
      26. vm.overcommit_memory = 2
      27. vm.overcommit_ratio = 95
      28. vm.swappiness = 10
      29. vm.dirty_expire_centisecs = 500
      30. vm.dirty_writeback_centisecs = 100
      31. vm.zone_reclaim_mode = 0
    2. Add the following parameters, some of the values will depend on the virtual machine settings calculated on the Sizing section.

      1. Determine the value of the RAM in bytes by creating the variable $RAM_IN_BYTES. For example, for a 30GB RAM virtual machine, run the following:

        1. $ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
      2. Define the following parameters that depend on the variable $RAM_IN_BYTES that you just created, and append them to the file /etc/sysctl.d/10-gpdb.conf by running the following commands:

        1. $ echo "vm.min_free_kbytes = $(($RAM_IN_BYTES * 3 / 100 / 1024))" >> /etc/sysctl.d/10-gpdb.conf
        2. $ echo "kernel.shmall = $(($RAM_IN_BYTES / 2 / 4096))" >> /etc/sysctl.d/10-gpdb.conf
        3. $ echo "kernel.shmmax = $(($RAM_IN_BYTES / 2))" >> /etc/sysctl.d/10-gpdb.conf
      3. If your virtual machine RAM is less than or equal to 64 GB, run the following commands:

        1. $ echo "vm.dirty_background_ratio = 3" >> /etc/sysctl.d/10-gpdb.conf
        2. $ echo "vm.dirty_ratio = 10" >> /etc/sysctl.d/10-gpdb.conf
      4. If your virtual machine RAM is greater than 64 GB, run the following commands:

        1. $ echo "vm.dirty_background_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
        2. $ echo "vm.dirty_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
        3. $ echo "vm.dirty_background_bytes = 1610612736 # 1.5GB" >> /etc/sysctl.d/10-gpdb.conf
        4. $ echo "vm.dirty_bytes = 4294967296 # 4GB" >> /etc/sysctl.d/10-gpdb.conf
  9. Configure ssh to allow password-less login.

    1. Edit /etc/ssh/sshd_config file and update following options:

      1. PasswordAuthentication yes
      2. ChallengeResponseAuthentication yes
      3. UsePAM yes
      4. MaxStartups 100
      5. MaxSessions 100
    2. Create ssh keys to allow passwordless login with root by running the following commands:

      1. # make sure to generate ssh keys without password. Press Enter for defaults
      2. $ ssh-keygen
      3. $ chmod 700 /root/.ssh
      4. # copy public key to authorized_keys
      5. $ cd /root/.ssh/
      6. $ cat id_rsa.pub > authorized_keys
      7. $ chmod 600 authorized_keys
      8. # it will add host signature to known_hosts
      9. $ ssh-keyscan -t rsa localhost > known_hosts
      10. # duplicate host signature for all hosts in the cluster
      11. $ key=$(cat known_hosts)
      12. # Replace `32` with your number of total segment virtual machines as necessary.
      13. $ for i in mdw $(seq -f "sdw%g" 1 32); do
      14. echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
      15. done
      16. $ chmod 644 known_hosts
  10. Configure the system resource limits to control the amount of resources used by Greenplum by creating the file /etc/security/limits.d/20-nproc.conf.

    1. Ensure that the directory exists before creating the file:

      1. $ mkdir -p /etc/security/limits.d
    2. Append the following contents to the end of /etc/security/limits.d/20-nproc.conf:

      1. * soft nofile 524288
      2. * hard nofile 524288
      3. * soft nproc 131072
      4. * hard nproc 131072
  11. Create the base mount point /gpdata for the virtual machine data drive:

    1. $ mkdir -p /gpdata
    2. $ mkfs.xfs /dev/sdb
    3. $ mount -t xfs -o rw,noatime,nodev,inode64 /dev/sdb /gpdata/
    4. $ df -kh
    5. $ echo /dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0 >> /etc/fstab
    6. $ mkdir -p /gpdata/primary
    7. $ mkdir -p /gpdata/master
  12. Configure the file /etc/rc.d/rc.local to make the following settings persistent:

    1. Update the file content:

      1. # Configure readahead for the `/dev/sdb` to 16384 512-byte sectors, i.e. 8MiB
      2. /sbin/blockdev --setra 16384 /dev/sdb
      3. # Configure gp-virtual-internal network settings with MTU 9000
      4. /sbin/ip link set ens192 mtu 9000
      5. # Configure jumbo frame RX ring buffer to 4096
      6. /sbin/ethtool --set-ring ens192 rx-jumbo 4096
    2. Make the file executable:

      1. $ chmod +x /etc/rc.d/rc.local
  13. Create the group and user gpadmin:gpadmin required by the Greenplum Database.

    1. Execute the following steps in order to create the user gpadmin in the group gpadmin:

      1. $ groupadd gpadmin
      2. $ useradd -g gpadmin -m gpadmin
      3. $ passwd gpadmin
      4. # Enter the desired password at the prompt
    2. (Optional) Change the root password to a preferred password:

      1. $ passwd root
      2. # Enter the desired password at the prompt
    3. Create the file /home/gpadmin/.bashrc for gpadmin with the following content:

      1. ### .bashrc
      2. ### Source global definitions
      3. if [ -f /etc/bashrc ]; then
      4. . /etc/bashrc
      5. fi
      6. ### User specific aliases and functions
      7. ### If Greenplum has been installed, then add Greenplum-specific commands to the path
      8. if [ -f /usr/local/greenplum-db/greenplum_path.sh ]; then
      9. source /usr/local/greenplum-db/greenplum_path.sh
      10. fi
    4. Change the ownership of /home/gpadmin/.bashrc to gpadmin:gpadmin:

      1. $ chown gpadmin:gpadmin /home/gpadmin/.bashrc
    5. Change the ownership of the /gpdata directory to gpadmin:gpadmin:

      1. $ chown -R gpadmin:gpadmin /gpdata
    6. Create ssh keys for passwordless login as gpadmin user:

      1. $ su - gpadmin
      2. # make sure to generate ssh keys without password. Press Enter for defaults
      3. $ ssh-keygen
      4. $ chmod 700 /home/gpadmin/.ssh
      5. # copy public key to authorized_keys
      6. $ cd /home/gpadmin/.ssh/
      7. $ cat id_rsa.pub > authorized_keys
      8. $ chmod 600 authorized_keys
      9. # it will add host signature to known_hosts
      10. $ ssh-keyscan -t rsa localhost > known_hosts
      11. # duplicate host signature for all hosts in the cluster
      12. $ key=$(cat known_hosts)
      13. # Replace `32` with your number of total segment virtual machines as necessary.
      14. $ for i in mdw $(seq -f "sdw%g" 1 32); do
      15. echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
      16. done
      17. $ chmod 644 known_hosts
    7. Log out of gpadmin to go back to root before you proceed to the next step.

  14. Configure cgroups for Greenplum.

    For security and resource management, Greenplum Database makes use of the Linux cgroups.

    1. Install the cgroup configuration package:

      1. $ yum install -y libcgroup-tools
    2. Verify that the directory /etc/cgconfig.d exists:

      1. $ mkdir -p /etc/cgconfig.d
    3. Create the cgroups configuration file /etc/cgconfig.d/10-gpdb.conf for Greenplum:

      1. group gpdb {
      2. perm {
      3. task {
      4. uid = gpadmin;
      5. gid = gpadmin;
      6. }
      7. admin {
      8. uid = gpadmin;
      9. gid = gpadmin;
      10. }
      11. }
      12. cpu {
      13. }
      14. cpuacct {
      15. }
      16. cpuset {
      17. }
      18. memory {
      19. }
      20. }
    4. Prepare the configuration file and enable cgconfig via systemctl:

      1. $ cgconfigparser -l /etc/cgconfig.d/10-gpdb.conf
      2. $ systemctl enable cgconfig.service
  15. Update the /etc/hosts file with all of the IP addresses and hostnames in the network gp-virtual-internal.

    1. Verify that you have following parameters defined:

      • Total number of segment virtual machines you wish to deploy, the default is 32.
      • The starting IP address of the master virtual machine in the gp-virtual-internal port group, the default is 250.
      • The leading octets for the gp-virtual-internal network IP range, the default is 192.168.1..
      • The segment IP will start from 192.168.1.2
      • The master IP will be 192.168.1.250
    2. Create the file /root/update-etc-hosts.sh and insert the following commands:

      1. if [ $# -ne 2 ] ; then
      2. echo "Usage: $0 internal_cidr segment_count"
      3. exit 1
      4. fi
      5. if [ ! -f /etc/hosts.bak ]; then
      6. cp /etc/hosts /etc/hosts.bak
      7. else
      8. cp /etc/hosts.bak /etc/hosts
      9. fi
      10. internal_ip_cidr=${1}
      11. segment_host_count=${2}
      12. internal_network_ip=$(echo ${internal_ip_cidr} | cut -d"/" -f1)
      13. internal_netmask=$(echo ${internal_ip_cidr} | cut -d"/" -f2)
      14. if [ ${internal_netmask} -lt 20 ] && [ ${internal_netmask} -gt 24 ]; then
      15. echo "The CIDR should contain a netmask between 20 and 24."
      16. exit 1
      17. fi
      18. max_segment_hosts=$(( 2**(32 - internal_netmask) - 8 ))
      19. if [ ${max_segment_hosts} -lt ${segment_host_count} ]; then
      20. echo "ERROR: The CIDR does not have enough IPs available (${max_segment_hosts}) to meet the VM count (${segment_host_count})."
      21. exit 1
      22. fi
      23. octet3=$(echo ${internal_ip_cidr} | cut -d"." -f3)
      24. ip_prefix=$(echo ${internal_ip_cidr} | cut -d"." -f1-2)
      25. octet3_mask=$(( 256-2**(24 - internal_netmask) ))
      26. octet3_base=$(( octet3_mask&octet3 ))
      27. master_octet3=$(( octet3_base + 2**(24 - internal_netmask) - 1 ))
      28. master_ip="${ip_prefix}.${master_octet3}.250"
      29. standby_ip="${ip_prefix}.${master_octet3}.251"
      30. printf "\n${master_ip}\tmdw\n${standby_ip}\tsmdw\n" >> /etc/hosts
      31. i=2
      32. for hostname in $(seq -f "sdw%g" 1 ${segment_host_count}); do
      33. segment_internal_ip="${ip_prefix}.$(( octet3_base + i / 256 )).$(( i % 256 ))"
      34. printf "${segment_internal_ip}\t${hostname}\n" >> /etc/hosts
      35. let i=i+1
      36. done
    3. Run the script passing in two parameters, internal CIDR and segment host count. For example: bash /root/update-etc-hosts.sh 192.168.1.1/24 32

  16. Create two files hosts-all and hosts-segments under /home/gpadmin. Replace 32 with your number of primary segment virtual machines as necessary.

    1. $ echo mdw > /home/gpadmin/hosts-all
    2. $ > /home/gpadmin/hosts-segments
    3. $ for i in {1..32}; do
    4. echo "sdw${i}" >> /home/gpadmin/hosts-all
    5. echo "sdw${i}" >> /home/gpadmin/hosts-segments
    6. done
    7. $ chown gpadmin:gpadmin /home/gpadmin/hosts*

Adding Greenplum Database Service

  1. Create the directory gpv directory in /etc/:

    1. $ mkdir -p /etc/gpv
  2. Create the service log directory:

    1. $ mkdir -p /var/log/gpv
    2. $ chmod a+rwx /var/log/gpv
  3. Create a service file /etc/gpv/gpdb-service and paste in the following contents:

    1. #!/bin/bash
    2. set -e
    3. if [ -d /gpdata/master/gpseg* ]; then
    4. POSTMASTER_FILE_PATH=$(ls -d /gpdata/master/gpseg*)
    5. printf -v PGCTL_OPTION ' -D %s -w -t 120 -o " %s " ' ${POSTMASTER_FILE_PATH} "-E"
    6. else
    7. POSTMASTER_FILE_PATH=$(ls -d /gpdata/primary/gpseg*)
    8. printf -v PGCTL_OPTION ' -D %s -w -t 120 ' ${POSTMASTER_FILE_PATH}
    9. fi
    10. echo POSTMASTER_FILE_PATH is ${POSTMASTER_FILE_PATH}
    11. echo PGCTL_OPTION is ${PGCTL_OPTION}
    12. echo about to $1 ...
    13. case "$1" in
    14. start)
    15. if [ ! -z "$(ps -ef | grep postgres | grep gpseg | cut -d ' ' -f 3)" ]; then
    16. echo there is an existing postmaster running by somebody else, stop it
    17. /usr/local/greenplum-db/bin/pg_ctl -w -D ${POSTMASTER_FILE_PATH} --mode=fast stop
    18. fi
    19. echo clean-up left-over files if any
    20. rm -f /tmp/.s*
    21. rm -f ${POSTMASTER_FILE_PATH}/postmaster.pid
    22. echo starting new postmaster ...
    23. eval /usr/local/greenplum-db/bin/pg_ctl ${PGCTL_OPTION} start
    24. echo postmaster is started
    25. echo extracting postmaster pid...
    26. touch /home/gpadmin/.gpv.postmaster.pid
    27. POSTMASTER_PID=$(head -1 ${POSTMASTER_FILE_PATH}/postmaster.pid)
    28. echo ${POSTMASTER_PID} > /home/gpadmin/.gpv.postmaster.pid
    29. echo $(date) >> /home/gpadmin/.gpv.postmaster.pid
    30. echo remembered the postmaster pid as ${POSTMASTER_PID}
    31. ;;
    32. stop)
    33. echo stopping postmaster with pid $(cat /home/gpadmin/.gpv.postmaster.pid) ...
    34. /usr/local/greenplum-db/bin/pg_ctl -w -D ${POSTMASTER_FILE_PATH} --mode=fast stop
    35. echo postmaster is stopped
    36. ;;
    37. *)
    38. echo "Usage: $0 {start|stop}"
    39. esac
    40. echo all done
    41. exit 0
  4. Make the file executable:

    1. $ chmod +x /etc/gpv/gpdb-service
  5. Create a service file /etc/systemd/system/gpdb.service and paste in the following contents:

    1. [Unit]
    2. Description=Greenplum Service
    3. [Service]
    4. Type=forking
    5. User=gpadmin
    6. LimitNOFILE=524288
    7. LimitNPROC=131072
    8. ExecStart=/bin/bash -l -c "/etc/gpv/gpdb-service start | tee -a /var/log/gpv/gpdb-service.log"
    9. ExecStop=/bin/bash -l -c "/etc/gpv/gpdb-service stop | tee -a /var/log/gpv/gpdb-service.log"
    10. TimeoutStartSec=120
    11. Restart=on-failure
    12. PIDFile=/home/gpadmin/.gpv.postmaster.pid
    13. RestartSec=1s
    14. [Install]
    15. WantedBy=multi-user.target

Installing the Greenplum Database Software

  1. Download the latest version of the Greenplum Database Server 6 for RHEL 7 from VMware Tanzu Network.

  2. Move the downloaded binary in to the virtual machine and install Greenplum:

    1. $ scp greenplum-db-6.*.rpm root@greenplum-db-base-vm:/tmp
    2. $ ssh root@greenplum-db-base-vm
    3. $ yum install -y /tmp/greenplum-db-6.*.rpm
  3. Install the following yum packages for better supportability:

    • dstat to monitor system statistics, like network and I/O performance.
    • sos to generate an sosreport, a best practice to collect system information for support purposes.
    • tree to visualize folder structure.
    • wget to easily get artifacts from the Internet.
    1. $ yum install -y dstat
    2. $ yum install -y sos
    3. $ yum install -y tree
    4. $ yum install -y wget
  4. Power down the virtual machine:

    1. $ shutdown now
  5. Enable vApp options in vCenter:

    • Select the VM greenplum-db-base-vm
    • In the VM view, click on Configure tab at the top of the page
    • If vApp Option is disabled, then click EDIT…
      • click Enable vApp options
      • click OK
  6. Add vApp option guestinfo.primary_segment_count:

    • Select Settings -> vApp Options
    • Under Properties, click ADD
    • In the General tab, enter the following:
      • For Category, enter Greenplum
      • For Label, enter Number of Primary Segments
      • For Key ID, enter guestinfo.primary_segment_count
    • In the Type tab, enter the following:
      • For Type, select Integer
      • For Range, enter range 1-1000
    • Click on Save
    • Select the new property
    • Click Set Value, and enter an appropriate value, for example: 32
  7. Add vApp option guestinfo.internal_ip_cidr:

    • Under Properties, click ADD again
    • In the General tab, enter the following:
      • For Category, enter Internal Network
      • For Label, enter Internal Network CIDR (with netmask /24)
      • For Key ID, enter guestinfo.internal_ip_cidr
    • In the Type tab, enter the following:
      • For Type, select String
      • For Length, enter range 12-18
    • Click on Save
    • Select the new property
    • Click Set Value, and enter an appropriate value: for example: 192.168.10.1/24
  8. Add vApp option guestinfo.deployment_type:

    • Under Properties, click ADD again
    • In the General tab, enter the following:
      • For Category, enter Greenplum
      • For Label, enter Deployment type
      • For Key ID, enter guestinfo.deployment_type
    • In the Type tab, enter the following:
      • For Type, select String
    • Click on Save
    • Select the new property
    • Click Set Value, and enter mirrorless

Validating the Base Virtual Machine

Validate that the newly created base virtual machine is configured correctly.

Verifying the Base Virtual Machine Settings

  1. Reboot the base virtual machine.
  2. Log in to the virtual machine as root.
  3. Verify that the following services are disabled:

    1. SELinux

      1. $ sestatus
      2. SELinux status: disabled
    2. Firewall

      1. $ systemctl status firewalld
      2. firewalld.service
      3. Loaded: masked (/dev/null; bad)
      4. Active: inactive (dead)
    3. Tune

      1. $ systemctl status tuned
      2. tuned.service
      3. Loaded: masked (/dev/null; bad)
      4. Active: inactive (dead)
    4. Chrony

      1. $ systemctl status chronyd
      2. chronyd.service
      3. Loaded: masked (/dev/null; bad)
      4. Active: inactive (dead)
  4. Verify that ntpd is installed and enabled:

    1. $ systemctl status ntpd
    2. ntpd.service - Network Time Service
    3. Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
    4. Active: active (running) since Tue 2021-05-04 18:47:25 EDT; 4s ago
  5. Verify that the NTP servers are configured correctly and the remote servers are ordered properly:

    1. $ ntpq -pn
    2. remote refid st t when poll reach delay offset jitter
    3. =================================================================================
    4. -xx.xxx.xxx.xxx xx.xxx.xxx.xxx 3 u 246 256 377 0.186 2.700 0.993
    5. +xx.xxx.xxx.xxx xx.xxx.xxx.xxx 3 u 223 256 377 26.508 0.247 0.397
  6. Verify that the filesystem configuration is correct:

    1. $ lsblk /dev/sdb
    2. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    3. sdb 8:16 0 250G 0 disk /gpdata/
    4. $ grep sdb /etc/fstab
    5. /dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0
    6. $ df -Th | grep sdb
    7. /dev/sdb xfs 250G 167M 250G 1% /gpdata
    8. $ ls -l /gpdata
    9. total 0
    10. drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 master
    11. drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 mirror
    12. drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 primary
  7. Verify that the parameters transparent_hugepage=never and elevator=deadline exist:

    1. $ cat /proc/cmdline
    2. BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 transparent_hugepage=never elevator=deadline
  8. Verify that the ulimit settings match your specification by running the following command:

    1. $ ulimit -a
    2. core file size (blocks, -c) 0
    3. data seg size (kbytes, -d) unlimited
    4. scheduling priority (-e) 0
    5. file size (blocks, -f) unlimited
    6. pending signals (-i) 119889
    7. max locked memory (kbytes, -l) 64
    8. max memory size (kbytes, -m) unlimited
    9. open files (-n) 524288
    10. pipe size (512 bytes, -p) 8
    11. POSIX message queues (bytes, -q) 819200
    12. real-time priority (-r) 0
    13. stack size (kbytes, -s) 8192
    14. cpu time (seconds, -t) unlimited
    15. max user processes (-u) 131072
    16. virtual memory (kbytes, -v) unlimited
    17. file locks (-x) unlimited
  9. Verify that the necessary yum packages are installed, by running rpm -qa:

    1. $ rpm -qa | grep apr
    2. $ rpm -qa | grep apr-util
    3. $ rpm -qa | grep dstat
    4. $ rpm -qa | grep greenplum-db-6
    5. $ rpm -qa | grep krb5-devel
    6. $ rpm -qa | grep libcgroup-tools
    7. $ rpm -qa | grep libevent
    8. $ rpm -qa | grep libyaml
    9. $ rpm -qa | grep net-tools
    10. $ rpm -qa | grep ntp
    11. $ rpm -qa | grep perl
    12. $ rpm -qa | grep rsync
    13. $ rpm -qa | grep sos
    14. $ rpm -qa | grep tree
    15. $ rpm -qa | grep wget
    16. $ rpm -qa | grep which
    17. $ rpm -qa | grep zip
  10. Verify that you configured the Greenplum Database cgroups correctly by running the commands below.

    1. Identify the cgroup directory mount point:

      1. $ grep cgroup /proc/mounts

      The first line from the above output identifies the cgroup mount point. For example, /sys/fs/cgroup.

    2. Run the following commands, replacing <cgroup_mount_point> with the mount point which you identified in the previous step:

      1. $ ls -l <cgroup_mount_point>/cpu/gpdb
      2. $ ls -l <cgroup_mount_point>/cpuacct/gpdb
      3. $ ls -l <cgroup_mount_point>/cpuset/gpdb
      4. $ ls -l <cgroup_mount_point>/memory/gpdb

      The above directories must exist and must be owned by gpadmin:gpadmin.

    3. Verify that the cgconfig service is running by executing the following command:

      1. $ systemctl status cgconfig.service
  11. Verify that the sysctl settings have been applied correctly based on your virtual machine settings.

    1. First define the variable $RAM_IN_BYTES again on this virtual machine. For example, for a 30 GB RAM:

      1. $ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
    2. Retrieve the values listed below by running sysctl <kernel setting> and confirm that the values match the verifier specified for each setting.

      Kernel SettingValue
      vm.min_free_kbytes$(($RAM_IN_BYTES * 3 / 100 / 1024))
      vm.overcommit_memory2
      vm.overcommit_ratio95
      net.ipv4.ip_local_port_range10000 65535
      kernel.shmall$(($RAM_IN_BYTES / 2 / 4096))
      kernel.shmmax$(($RAM_IN_BYTES / 2))
    3. For a virtual machine with 64 GB of RAM or less:

      Kernel SettingValue
      vm.dirty_background_ratio3
      vm.dirty_ratio10
    4. For a virtual machine with more than 64 GB of RAM:

      Kernel SettingValue
      vm.dirty_background_ratio0
      vm.dirty_ratio0
      vm.dirty_background_bytes1610612736
      vm.dirty_bytes4294967296
  12. Verify that ssh command allows passwordless login as gpadmin user without prompting for a password:

    1. $ su - gpadmin
    2. $ ssh localhost
    3. $ exit
    4. $ exit
  13. Verify the readahead value:

    1. $ /sbin/blockdev --getra /dev/sdb
    2. 16384
  14. Verify the RX Jumbo buffer ring setting:

    1. $ /sbin/ethtool -g ens192 | grep Jumbo
    2. RX Jumbo: 4096
    3. RX Jumbo: 4096
  15. Verify the MTU size:

    1. $ /sbin/ip a | grep 9000
    2. 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
  16. Power off the VM.

Allocating the Virtual Machines with Terraform

Provisioning the Virtual Machines

Use the Terraform software you installed in Creating the Jumpbox Virtual Machine to generate copies of the base virtual machine you just created. Next, configure the copies based on the number of virtual machines in your environment, IP address ranges, and other settings you specify in the installation script.

  1. Create a file named main.tf and copy the contents described as below,

    • For deploying in vSAN datastore ( vSAN storage ), copy the contents from OVA Script.

      or

    • For deploying in Datastore Cluster ( PowerFlex or any other storage provisioner ), copy the contents from Datastore Cluster OVA Script.

      • Note: We suggest to Turn ON vSphere Storage DRS on the Datastore Cluster and set Cluster automation level to No Automation (Manual Mode)
  2. Log in to the jumpbox virtual machine as root.
  3. Update the following variables under the Terraform variables section of the main.tf script with the correct values for your environment. You collected the required information in the Prerequisites section.

    VariableDescription
    vsphere_userName of the VMware vSphere administrator level user.
    vsphere_passwordPassword of the VMware vSphere administrator level user.
    vsphere_serverThe IP address or, preferably, the Fully-Qualified Domain Name (FQDN) of your vCenter server.
    vsphere_datacenterThe name of the data center for Greenplum in your vCenter environment.
    vsphere_compute_clusterThe name of the compute cluster for Greenplum in your data center.
    vsphere_datastoreThe name of the vSAN datastore which will contain your Greenplum data. (e.g vSAN)
    vsphere_datastore_clusterThe name of the PowerFlex datastore cluster which will contain your Greenplum data. (e.g PowerFlex)
    vsphere_storage_policyThe name of the storage policy defined during Setting Up VMware vSphere Storage or Setting Up VMware vSphere Encryption. (e.g vSAN)
    prefixA customizable prefix name for the resource pool, Greenplum VMs, and DRS affinity rules which will be created by Terraform
    gp_virtual_external_ipv4_addressesThe routable IP addresses for mdw and smdw, in that order; for example: [“10.0.0.111”, “10.0.0.112”].
    gp_virtual_external_ipv4_netmaskThe number of bits in the netmask for gp-virtual-external; for example: 24.
    gp_virtual_external_gatewayThe gateway IP address for the gp-virtual-external network.
    dns_serversThe DNS servers for the gp-virtual-external network, listed as an array; for example: [“8.8.8.8”, “8.8.4.4”].
    gp_virtual_etl_bar_ipv4_cidrThe leading octets for the ETL, backup and restore network, non-routable network gp-virtual-etl-bar; for example: 192.168.2.0/24.
  4. Initialize Terraform:

    1. $ terraform init

    You should get the following output:

    1. Terraform has been successfully initialized!
    2. You may now begin working with Terraform. Try running "terraform plan" to see
    3. any changes that are required for your infrastructure. All Terraform commands
    4. should now work.
    5. If you ever set or change modules or backend configuration for Terraform,
    6. re-run this command to reinitialize your working directory. If you forget, other
    7. commands will detect it and remind you to do so if necessary.
  5. Verify that your Terraform configuration is correct by running the following command:

    1. $ terraform plan
  6. Deploy the cluster:

    1. $ terraform apply

    Answer Yes to the following prompt:

    1. Do you want to perform these actions?
    2. Terraform will perform the actions described above.
    3. Only 'yes' will be accepted to approve.
    4. Enter a value: yes

The virtual machines will be created and configured to deploy your Greenplum cluster. You can check the progress under the Recent Tasks panel on your VMware vSphere client.

Once Terraform has completed, it generates a file named terraform.tfstate. This file must not be deleted, as it keeps a record of all the virtual machines and their states. Terraform also uses this file when modifying any virtual machines. We also recommend that you retain a snapshot of the jumpbox virtual machine.

Terraform timeout

Occasionally, Terraform may time out when deploying the virtual machines. If a virtual machine cannot be cloned within the timeout value, by default 30 minutes, Terraform will fail and the cluster setup will be incomplete. Terraform will report the following error:

  1. error cloning virtual machine: timeout waiting for clone to complete

You must review the root cause of the issue which resides within the vCenter environment, check host and storage performance in order to find out why a virtual machine is taking over 30 minutes to be cloned. There are two ways of working around this issue by editing Terraform settings:

  1. Reduce the parallelism of Terraform from 10 to 5 and redeploy the cluster by running the following command:

    1. terraform apply --parallelism 5
  2. Increase the Terraform timeout property, set in minutes. See more about this property in the Terraform documentation.

    Modify the main.tf script in two places, one for the segment_hosts and another one for the master_hosts, add the property timeout under the clone section:

    1. ...
    2. resource "vsphere_virtual_machine" "segment_hosts" {
    3. ...
    4. clone {
    5. ...
    6. timeout = 40
    7. ...
    8. }
    9. }
    10. resource "vsphere_virtual_machine" "master_hosts" {
    11. ...
    12. clone {
    13. ...
    14. timeout = 40
    15. ...
    16. }
    17. }

    After saving the changes, rerun terraform apply to redeploy the cluster.

Validating the Deployment

Once Terraform has provisioned the virtual machines, perform the following validation steps:

  1. Validate the Resource Pool for the Greenplum cluster.

    1. Log in to vCenter and navigate to Hosts and Clusters.
    2. Select the newly created resource pool and verify that the Resource Settings are as below:

      centered image
      Note that the Worst Case Allocation fields will differ depending on what is currently running in your environment.

    3. Click the expanding arrow next to the resource pool name, you should see all the newly created virtual machines: gp-1-mdw, gp-1-sdw1, etc.

  2. Validate that the gp-virtual-internal network is working.

    1. Log in to the master node as root.
    2. Switch to gpadmin user.

      1. $ su - gpadmin
    3. Make sure that the file /home/gpadmin/hosts-all exists.

    4. Use the gpssh command to verify connectivity to all nodes in the gp-virtual-internal network.

      1. $ gpssh -f hosts-all -e hostname
  3. Validate the MTU settings on all virtual machines.

    1. Log in to the master node as root.
    2. Use the gpssh command to verify the value of the MTU.

      1. $ source /usr/local/greenplum-db/greenplum_path.sh
      2. $ gpssh -f /home/gpadmin/hosts-all -e "ifconfig ens192 | grep -i mtu"
  4. Clean Up the Temporary VMware vSphere Admin Account

If you created a temporary VMware vSphere administrator level user such as greenplum, it is safe to remove it now.

Deploying Greenplum

You are now ready to deploy Greenplum Database on the newly deployed cluster. Perform the steps below from the Greenplum master node.

Deploying a Greenplum Database Cluster

  1. Initialize the Greenplum cluster.

    1. Log in to the Greenplum master node as gpadmin user.

    2. Create the Greenplum GUC (global user configuration) file gp_guc_config and paste in the following contents:.

      1. ### Interconnect Settings
      2. gp_interconnect_queue_depth=16
      3. gp_interconnect_snd_queue_depth=16
      4. # Since you have one segment per VM and less competing workloads per VM,
      5. # you can set the memory limit for resource group higher than the default
      6. gp_resource_group_memory_limit=0.85
      7. # This value should be 5% of the total RAM on the VM
      8. statement_mem=1536MB
      9. # This value should be set to 25% of the total RAM on the VM
      10. max_statement_mem=7680MB
      11. # This value should be set to 85% of the total RAM on the VM
      12. gp_vmem_protect_limit=26112
      13. # Since you have less I/O bandwidth, you can turn this parameter on
      14. gp_workfile_compression=on
      15. # Mirrorless GUCs
      16. wal_level=minimal
      17. max_wal_senders=0
      18. wal_keep_segments=0
      19. max_replication_slots=0
      20. gp_dispatch_keepalives_idle=20
      21. gp_dispatch_keepalives_interval=20
      22. gp_dispatch_keepalives_count=44
    3. Create the Greenplum configuration script create_gpinitsystem_config.sh and paste in the following contents:

      1. #!/bin/bash
      2. # setup the gpinitsystem config
      3. primary_array() {
      4. num_primary_segments=$1
      5. array=""
      6. newline=$'\n'
      7. # master has db_id 0, primary starts with db_id 1, primaries are always odd
      8. for i in $( seq 0 $(( num_primary_segments - 1 )) ); do
      9. content_id=${i}
      10. db_id=$(( i + 1 ))
      11. array+="sdw${db_id}~sdw${db_id}~6000~/gpdata/primary/gpseg${content_id}~${db_id}~${content_id}${newline}"
      12. done
      13. echo "${array}"
      14. }
      15. create_gpinitsystem_config() {
      16. num_primary_segments=$1
      17. echo "Generate gpinitsystem"
      18. cat <<EOF> ./gpinitsystem_config
      19. ARRAY_NAME="Greenplum Data Platform"
      20. TRUSTED_SHELL=ssh
      21. CHECK_POINT_SEGMENTS=8
      22. ENCODING=UNICODE
      23. SEG_PREFIX=gpseg
      24. HEAP_CHECKSUM=on
      25. HBA_HOSTNAMES=0
      26. QD_PRIMARY_ARRAY=mdw~mdw~5432~/gpdata/master/gpseg-1~0~-1
      27. declare -a PRIMARY_ARRAY=(
      28. $( primary_array ${num_primary_segments} )
      29. )
      30. EOF
      31. }
      32. num_primary_segments=$1
      33. if [ -z "$num_primary_segments" ]; then
      34. echo "Usage: bash create_gpinitsystem_config.sh <num_primary_segments>"
      35. else
      36. create_gpinitsystem_config ${num_primary_segments}
      37. fi
    4. Run the script to generate the configuration file for gpinitsystem. Replace 32 with the number of primary segments as necessary.

      1. $ bash create_gpinitsystem_config.sh 32

      You should now see a file called gpinitsystem_config.

    5. Run the following command to initialize the Greenplum Database:

      1. $ gpinitsystem -a -I gpinitsystem_config -p gp_guc_config
    6. Enable and start the gpdb.service

      1. $ gpssh -f /home/gpadmin/hosts-all "sudo systemctl enable gpdb.service"
      2. $ gpssh -f /home/gpadmin/hosts-all "sudo systemctl start gpdb.service"
      3. $ gpssh -f /home/gpadmin/hosts-all "systemctl status gpdb.service"
  2. Configure the Greenplum master and standby master environment variables, and load the master variables:

    1. $ echo export MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1 >> ~/.bashrc
    2. $ source ~/.bashrc

Next Steps

Now that the Greenplum Database has been deployed, follow the steps provided in Validating the Greenplum Installation to ensure Greenplum Database has been installed correctly.