The Greenplum gpload utility loads data using readable external tables and the Greenplum parallel file server (gpfdist or gpfdists). It handles parallel file-based external table setup and allows users to configure their data format, external table definition, and gpfdist or gpfdists setup in a single configuration file.

Note

gpfdist and gpload are compatible only with the Greenplum Database major version in which they are shipped. For example, a gpfdist utility that is installed with Greenplum Database 4.x cannot be used with Greenplum Database 5.x or 6.x.

MERGE and UPDATE operations are not supported if the target table column name is a reserved keyword, has capital letters, or includes any character that requires quotes (“ “) to identify the column.

To use gpload

  1. Ensure that your environment is set up to run gpload. Some dependent files from your Greenplum Database installation are required, such as gpfdist and Python, as well as network access to the Greenplum segment hosts.

    See the Greenplum Database Reference Guide for details.

  2. Create your load control file. This is a YAML-formatted file that specifies the Greenplum Database connection information, gpfdist configuration information, external table options, and data format.

    See the Greenplum Database Reference Guide for details.

    For example:

    1. ---
    2. VERSION: 1.0.0.1
    3. DATABASE: ops
    4. USER: gpadmin
    5. HOST: mdw-1
    6. PORT: 5432
    7. GPLOAD:
    8. INPUT:
    9. - SOURCE:
    10. LOCAL_HOSTNAME:
    11. - etl1-1
    12. - etl1-2
    13. - etl1-3
    14. - etl1-4
    15. PORT: 8081
    16. FILE:
    17. - /var/load/data/*
    18. - COLUMNS:
    19. - name: text
    20. - amount: float4
    21. - category: text
    22. - descr: text
    23. - date: date
    24. - FORMAT: text
    25. - DELIMITER: '|'
    26. - ERROR_LIMIT: 25
    27. - LOG_ERRORS: true
    28. OUTPUT:
    29. - TABLE: payables.expenses
    30. - MODE: INSERT
    31. PRELOAD:
    32. - REUSE_TABLES: true
    33. SQL:
    34. - BEFORE: "INSERT INTO audit VALUES('start', current_timestamp)"
    35. - AFTER: "INSERT INTO audit VALUES('end', current_timestamp)"
  3. Run gpload, passing in the load control file. For example:

    1. gpload -f my_load.yml

Parent topic: Loading and Unloading Data