Auditing

Describes Greenplum Database events that are logged and should be monitored to detect security threats.

Greenplum Database is capable of auditing a variety of events, including startup and shutdown of the system, segment database failures, SQL statements that result in an error, and all connection attempts and disconnections. Greenplum Database also logs SQL statements and information regarding SQL statements, and can be configured in a variety of ways to record audit information with more or less detail. The log_error_verbosity configuration parameter controls the amount of detail written in the server log for each message that is logged. Similarly, the log_min_error_statement parameter allows administrators to configure the level of detail recorded specifically for SQL statements, and the log_statement parameter determines the kind of SQL statements that are audited. Greenplum Database records the username for all auditable events, when the event is initiated by a subject outside the Greenplum Database.

Greenplum Database prevents unauthorized modification and deletion of audit records by only allowing administrators with an appropriate role to perform any operations on log files. Logs are stored in a proprietary format using comma-separated values (CSV). Each segment and the master stores its own log files, although these can be accessed remotely by an administrator. Greenplum Database also authorizes overwriting of old log files via the log_truncate_on_rotation parameter. This is a local parameter and must be set on each segment and master configuration file.

Greenplum provides an administrative schema called gp_toolkit that you can use to query log files, as well as system catalogs and operating environment for system status information. For more information, including usage, refer to The gp_tookit Administrative Schema appendix in the Greenplum Database Reference Guide.

Viewing the Database Server Log Files

Every database instance in Greenplum Database (master and segments) is a running PostgreSQL database server with its own server log file. Daily log files are created in the pg_log directory of the master and each segment data directory.

The server log files are written in comma-separated values (CSV) format. Not all log entries will have values for all of the log fields. For example, only log entries associated with a query worker process will have the slice_id populated. Related log entries of a particular query can be identified by its session identifier (gp_session_id) and command identifier (gp_command_count).

#Field NameData TypeDescription
1event_timetimestamp with time zoneTime that the log entry was written to the log
2user_namevarchar(100)The database user name
3database_namevarchar(100)The database name
4process_idvarchar(10)The system process id (prefixed with “p”)
5thread_idvarchar(50)The thread count (prefixed with “th”)
6remote_hostvarchar(100)On the master, the hostname/address of the client machine. On the segment, the hostname/address of the master.
7remote_portvarchar(10)The segment or master port number
8session_start_timetimestamp with time zoneTime session connection was opened
9transaction_idintTop-level transaction ID on the master. This ID is the parent of any subtransactions.
10gp_session_idtextSession identifier number (prefixed with “con”)
11gp_command_counttextThe command number within a session (prefixed with “cmd”)
12gp_segmenttextThe segment content identifier (prefixed with “seg” for primaries or “mir” for mirrors). The master always has a content id of -1.
13slice_idtextThe slice id (portion of the query plan being run)
14distr_tranx_id textDistributed transaction ID
15local_tranx_idtextLocal transaction ID
16sub_tranx_idtextSubtransaction ID
17event_severityvarchar(10)Values include: LOG, ERROR, FATAL, PANIC, DEBUG1, DEBUG2
18sql_state_codevarchar(10)SQL state code associated with the log message
19event_messagetextLog or error message text
20event_detailtextDetail message text associated with an error or warning message
21event_hinttextHint message text associated with an error or warning message
22internal_querytextThe internally-generated query text
23internal_query_posintThe cursor index into the internally-generated query text
24event_contexttextThe context in which this message gets generated
25debug_query_stringtextUser-supplied query string with full detail for debugging. This string can be modified for internal use.
26error_cursor_posintThe cursor index into the query string
27func_nametextThe function in which this message is generated
28file_nametextThe internal code file where the message originated
29file_lineintThe line of the code file where the message originated
30stack_tracetextStack trace text associated with this message

Greenplum provides a utility called gplogfilter that can be used to search through a Greenplum Database log file for entries matching the specified criteria. By default, this utility searches through the Greenplum master log file in the default logging location. For example, to display the last three lines of the master log file:

  1. $ gplogfilter -n 3

You can also use gplogfilter to search through all segment log files at once by running it through the gpssh utility. For example, to display the last three lines of each segment log file:

  1. $ gpssh -f seg_host_file
  2. => source /usr/local/greenplum-db/greenplum_path.sh
  3. => gplogfilter -n 3 /data*/*/gp*/pg_log/gpdb*.csv

The following are the Greenplum security-related audit (or logging) server configuration parameters that are set in the postgresql.conf configuration file:

Field NameValue RangeDefaultDescription
log_connectionsBooleanoffThis outputs a line to the server log detailing each successful connection. Some client programs, like psql, attempt to connect twice while determining if a password is required, so duplicate “connection received” messages do not always indicate a problem.
log_disconnectionsBooleanoffThis outputs a line in the server log at termination of a client session, and includes the duration of the session.
log_statementNONE
DDL
MOD
ALL
ALLControls which SQL statements are logged. DDL logs all data definition commands like CREATE, ALTER, and DROP commands. MOD logs all DDL statements, plus INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type.
log_hostnameBooleanoffBy default, connection log messages only show the IP address of the connecting host. Turning on this option causes logging of the host name as well. Note that depending on your host name resolution setup this might impose a non-negligible performance penalty.
log_durationBooleanoffCauses the duration of every completed statement which satisfies log_statement to be logged.
log_error_verbosityTERSE
DEFAULT
VERBOSE
DEFAULTControls the amount of detail written in the server log for each message that is logged.
log_min_duration_statementnumber of milliseconds, 0, -1-1Logs the statement and its duration on a single log line if its duration is greater than or equal to the specified number of milliseconds. Setting this to 0 will print all statements and their durations. -1 deactivates the feature. For example, if you set it to 250 then all SQL statements that run 250ms or longer will be logged. Enabling this option can be useful in tracking down unoptimized queries in your applications.
log_min_messagesDEBUG5
DEBUG4
DEBUG3
DEBUG2
DEBUG1
INFO
NOTICE
WARNING
ERROR
LOG
FATAL
PANIC
NOTICEControls which message levels are written to the server log. Each level includes all the levels that follow it. The later the level, the fewer messages are sent to the log.
log_rotation_size0 - INT_MAX/1024 kilobytes1048576When greater than 0, a new log file is created when this number of kilobytes have been written to the log. Set to zero to deactivate size-based creation of new log files.
log_rotation_ageAny valid time expression (number and unit)1dDetermines the lifetime of an individual log file. When this amount of time has elapsed since the current log file was created, a new log file will be created. Set to zero to deactivate time-based creation of new log files.
log_statement_statsBooleanoffFor each query, write total performance statistics of the query parser, planner, and executor to the server log. This is a crude profiling instrument.
log_truncate_on_rotationBooleanoffTruncates (overwrites), rather than appends to, any existing log file of the same name. Truncation will occur only when a new file is being opened due to time-based rotation. For example, using this setting in combination with a log_filename such as gpseg#-%H.log would result in generating twenty-four hourly log files and then cyclically overwriting them. When off, pre-existing files will be appended to in all cases.

Parent topic: Greenplum Database Security Configuration Guide