This topic provides guidance on how to size your VMware vSphere environment based on the total size of your Greenplum Database. Criteria such as the RAID type, number of deployed virtual machines, and total amount of available resources required will vary depending on the database size.

NOTE: The information in this topic assumes that you are planning for a Greenplum on VMware VSphere deployment on a Dell EMC VxRail architecture.

The minimum size for the proposed configuration is 4 ESXi nodes, which translates into a Greenplum Database with 64.12 TB of usable storage when no compression is being used. The maximum size is 16 ESXi hosts, which results in 397.32 TB of uncompressed usable storage for the database.

Calculating the Greenplum Database Size and ESXi Hosts

Determine the number of ESXi hosts you need based on your Greenplum Database cluster requirements. The table below shows the number of hosts are required based on the size of the Greenplum database.

ESXi HostsUsable Storage (TB) (No Compression)Raw Storage (TB)
464.12512
5122.04640
6147.05768
7172.07896
8197.091024
9222.111152
10247.131280
11272.151408
12297.181536
13322.201664
14347.221792
15372.251920
16397.322048

Usable storage is the total size of user data after applying the associated storage overhead; this is mainly needed by the vSAN storage policy (it requires 30% reservation) and the RAID configuration. It is also used for temporary data reservation, high availability (HA) reservation, and free spaces.

Determining the RAID Type

The following table provides recommendations to consider when choosing between the RAID1 (mirroring) and RAID5 (erasure coding) for the VMware vSphere storage policies that you configure in a later step. Note that these parameters have been calculated to ensure Greenplum fault tolerance against host failure.

RAID TypeESXi HostsSpace OverheadPerformanceRecovery Behavior
RAID1 (mirroring)42xWRITE 2.2 GiB/s per ESXi host
READ 10 GiB/s per ESXi host
In case of disk failure, a 250 GB vmdk requires
reading 250 GB of data to rebuild
RAID5
(erasure coding)
5 or more1.33xWRITE 2.2 GiB/s per ESXi host
READ 10 GiB/s per ESXi host
In case of disk failure, a 250 GB vmdk requires
reading 750 GB of data to rebuild

NOTE: The performance numbers captured above are measured performance numbers. Your actual performance numbers may vary.

Determining the Number of Virtual Machines

The number of Greenplum Database virtual machines you deploy depends on the number of hosts in your environment. You must plan for 8 primary segments and 8 mirror segments running on each ESXi host, as well as the Greenplum master and standby master virtual machines.

  1. total_number_of_greenplum_vms = 16 * (number_of_hosts) + 2

Note that the maximum number of Greenplum virtual machines that can be deployed in a VMware Tanzu Greenplum on vSphere environment is 1000.

For example, in a configuration of four ESXi hosts, you deploy a total of 66 Greenplum virtual machines:

  • one master virtual machine
  • one master standby virtual machine
  • 32 primary segment virtual machines
  • 32 mirror segment virtual machines

The following table calculates the number of necessary Greenplum virtual machines depending on the number of ESXi hosts:

Number of ESXi Hosts                    Total Number of Greenplum Virtual Machines
466
582
698
7114
8130
9146
10162
11178
12194
13210
14226
15242
162581

1 In order to support 258 Greenplum VMs, you must use CIDR addressing with a network mask of 23 or lower.

Note: you will additionally deploy a jumpbox virtual machine which you will use to allocate the virtual machines. You must take it into account when allocating IP addresses in the Prerequisites topic.

Sizing the Greenplum Virtual Machine

The virtual machines running the Greenplum Database software must be correctly sized to handle the database workloads depending on the cluster specifications.

All virtual machines must be configured with the following resources regardless of the environment:

vCPUsRAM (GB)Root Volume (GB)
83050

The following table displays the required Data volume size per virtual machine depending on the number of ESXi hosts. The provided values ensure that there is sufficient capacity left in the cluster if a host fails. Note that the storage policies are configured with thick provisioning.

ESXi HostsData Volume (GB)
42500
5 or more4000

Putting It All Together

You must ensure that your VMware Tanzu Greenplum on vSphere environment has enough resources to accommodate the usable space required, the choice of RAID, the number of virtual machines required, the resources needed for each virtual machine, high availability to allow for component failures, and optimal performance.

ESXi HostsUsable Storage (TB)Storage PolicyVirtual MachinesTotal Primary Segment vCPUsTotal Primary Segment RAM (GB)
464.12RAID166256960
5122.04RAID5823201,200
6147.05RAID5983841,440
7172.07RAID51144481,680
8197.09RAID51305121,920
9222.11RAID51465762,160
10247.13RAID51626402,400
11272.15RAID51787042,640
12297.18RAID51947682,880
13322.20RAID52108323,120
14347.22RAID52268963,360
15372.25RAID52429603,600
16397.32RAID52509923,720

Dell EMC VxRail Reference Architecture

This reference architecture uses Dell EMC VxRail version 7.0.200. See more details on page 22 of the Dell EMC VxRail Documentation.

DescriptionConfiguration
ModelP570F
CPUIntel(R) Xeon(R) Gold 6248R CPU @ 3.00 GHz
Logical Cores96 (HyperThreaded)
NICs2x PCIe 100 GbE dual port
Physical Cores48
RAM768 GB
Cache Storage4x Dell Express Flash NVMe ColdStream P4800x 750 GB PCIe U.2 SSD
Capacity Storage20x Dell Express Flash NVMe PM1725B (MU) 6.4 TB PCIe U.2 SSD

There are 4 vSAN disk groups, each of them composed of one cached drive and 5 capacity drives.

Next Steps

Once you have confirmed your version and model of Dell EMC VxRail, the number of EXSi hosts based on your Greenplum Database capacity, and calculated the number of virtual machines and the necessary available resources in your VMware vSphere environment, proceed to Prerequisites to prepare for the installation.