Using Functions and Operators

Description of user-defined and built-in functions and operators in Greenplum Database.

Parent topic: Querying Data

Using Functions in Greenplum Database

When you invoke a function in Greenplum Database, function attributes control the execution of the function. The volatility attributes (IMMUTABLE, STABLE, VOLATILE) and the EXECUTE ON attributes control two different aspects of function execution. In general, volatility indicates when the function is executed, and EXECUTE ON indicates where it is executed. The volatility attributes are PostgreSQL based attributes, the EXECUTE ON attributes are Greenplum Database attributes.

For example, a function defined with the IMMUTABLE attribute can be executed at query planning time, while a function with the VOLATILE attribute must be executed for every row in the query. A function with the EXECUTE ON MASTER attribute executes only on the master instance, and a function with the EXECUTE ON ALL SEGMENTS attribute executes on all primary segment instances (not the master).

These tables summarize what Greenplum Database assumes about function execution based on the attribute.

Table 1. Function Volatility Attributes in Greenplum Database
Function AttributeGreenplum SupportDescriptionComments
IMMUTABLEYesRelies only on information directly in its argument list. Given the same argument values, always returns the same result. 
STABLEYes, in most casesWithin a single table scan, returns the same result for same argument values, but results change across SQL statements.Results depend on database lookups or parameter values. current_timestamp family of functions is STABLE; values do not change within an execution.
VOLATILERestrictedFunction values can change within a single table scan. For example: random(), timeofday(). This is the default attribute.Any function with side effects is volatile, even if its result is predictable. For example: setval().
Table 2. Function EXECUTE ON attributes in Greenplum Database
Function AttributeDescriptionComments
EXECUTE ON ANYIndicates that the function can be executed on the master, or any segment instance, and it returns the same result regardless of where it executes. This is the default attribute.Greenplum Database determines where the function executes.
EXECUTE ON MASTERIndicates that the function must be executed on the master instance.Specify this attribute if the user-defined function executes queries to access tables.
EXECUTE ON ALL SEGMENTSIndicates that for each invocation, the function must be executed on all primary segment instances, but not the master. 
EXECUTE ON INITPLANIndicates that the function contains an SQL command that dispatches queries to the segment instances and requires special processing on the master instance by Greenplum Database when possible. 

You can display the function volatility and EXECUTE ON attribute information with the psql \df+ function command.

Refer to the PostgreSQL Function Volatility Categories documentation for additional information about the Greenplum Database function volatility classifications.

For more information about EXECUTE ON attributes, see CREATE FUNCTION.

In Greenplum Database, data is divided up across segments — each segment is a distinct PostgreSQL database. To prevent inconsistent or unexpected results, do not execute functions classified as VOLATILE at the segment level if they contain SQL commands or modify the database in any way. For example, functions such as setval() are not allowed to execute on distributed data in Greenplum Database because they can cause inconsistent data between segment instances.

A function can execute read-only queries on replicated tables (DISTRIBUTED REPLICATED) on the segments, but any SQL command that modifies data must execute on the master instance.

Note: The hidden system columns (ctid, cmin, cmax, xmin, xmax, and gp_segment_id) cannot be referenced in user queries on replicated tables because they have no single, unambiguous value. Greenplum Database returns a column does not exist error for the query.

To ensure data consistency, you can safely use VOLATILE and STABLE functions in statements that are evaluated on and run from the master. For example, the following statements run on the master (statements without a FROM clause):

  1. SELECT setval('myseq', 201);
  2. SELECT foo();

If a statement has a FROM clause containing a distributed table and the function in the FROM clause returns a set of rows, the statement can run on the segments:

  1. SELECT * from foo();

Greenplum Database does not support functions that return a table reference (rangeFuncs) or functions that use the refCursor data type.

Function Volatility and Plan Caching

There is relatively little difference between the STABLE and IMMUTABLE function volatility categories for simple interactive queries that are planned and immediately executed. It does not matter much whether a function is executed once during planning or once during query execution start up. But there is a big difference when you save the plan and reuse it later. If you mislabel a function IMMUTABLE, Greenplum Database may prematurely fold it to a constant during planning, possibly reusing a stale value during subsequent execution of the plan. You may run into this hazard when using PREPAREd statements, or when using languages such as PL/pgSQL that cache plans.

User-Defined Functions

Greenplum Database supports user-defined functions. See Extending SQL in the PostgreSQL documentation for more information.

Use the CREATE FUNCTION statement to register user-defined functions that are used as described in Using Functions in Greenplum Database. By default, user-defined functions are declared as VOLATILE, so if your user-defined function is IMMUTABLE or STABLE, you must specify the correct volatility level when you register your function.

By default, user-defined functions are declared as EXECUTE ON ANY. A function that executes queries to access tables is supported only when the function executes on the master instance, except that a function can execute SELECT commands that access only replicated tables on the segment instances. A function that accesses hash-distributed or randomly distributed tables must be defined with the EXECUTE ON MASTER attribute. Otherwise, the function might return incorrect results when the function is used in a complicated query. Without the attribute, planner optimization might determine it would be beneficial to push the function invocation to segment instances.

When you create user-defined functions, avoid using fatal errors or destructive calls. Greenplum Database may respond to such errors with a sudden shutdown or restart.

In Greenplum Database, the shared library files for user-created functions must reside in the same library path location on every host in the Greenplum Database array (masters, segments, and mirrors).

You can also create and execute anonymous code blocks that are written in a Greenplum Database procedural language such as PL/pgSQL. The anonymous blocks run as transient anonymous functions. For information about creating and executing anonymous blocks, see the DO command.

Built-in Functions and Operators

The following table lists the categories of built-in functions and operators supported by PostgreSQL. All functions and operators are supported in Greenplum Database as in PostgreSQL with the exception of STABLE and VOLATILE functions, which are subject to the restrictions noted in Using Functions in Greenplum Database. See the Functions and Operators section of the PostgreSQL documentation for more information about these built-in functions and operators.

Greenplum Database includes JSON processing functions that manipulate values the json data type. For information about JSON data, see Working with JSON Data.

Table 3. Built-in functions and operators
Operator/Function CategoryVOLATILE FunctionsSTABLE FunctionsRestrictions
Logical Operators   
Comparison Operators   
Mathematical Functions and Operatorsrandom

setseed

  
String Functions and OperatorsAll built-in conversion functionsconvert

pg_client_encoding

 
Binary String Functions and Operators   
Bit String Functions and Operators   
Pattern Matching   
Data Type Formatting Functions to_char

to_timestamp

 
Date/Time Functions and Operatorstimeofdayage

current_date

current_time

current_timestamp

localtime

localtimestamp

now

 
Enum Support Functions   
Geometric Functions and Operators   
Network Address Functions and Operators   
Sequence Manipulation Functionsnextval()

setval()

  
Conditional Expressions   
Array Functions and Operators All array functions 
Aggregate Functions   
Subquery Expressions   
Row and Array Comparisons   
Set Returning Functionsgenerate_series  
System Information Functions All session information functions

All access privilege inquiry functions

All schema visibility inquiry functions

All system catalog information functions

All comment information functions

All transaction ids and snapshots

 
System Administration Functionsset_config

pg_cancel_backend

pg_terminate_backend

pg_reload_conf

pg_rotate_logfile

pg_start_backup

pg_stop_backup

pg_size_pretty

pg_ls_dir

pg_read_file

pg_stat_file

current_setting

All database object size functions

Note: The function pg_column_size displays bytes required to store the value, possibly with TOAST compression.
XML Functions and function-like expressions 

cursor_to_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text)

cursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text)

database_to_xml(nulls boolean, tableforest boolean, targetns text)

database_to_xmlschema(nulls boolean, tableforest boolean, targetns text)

database_to_xml_and_xmlschema( nulls boolean, tableforest boolean, targetns text)

query_to_xml(query text, nulls boolean, tableforest boolean, targetns text)

query_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text)

query_to_xml_and_xmlschema( query text, nulls boolean, tableforest boolean, targetns text)

schema_to_xml(schema name, nulls boolean, tableforest boolean, targetns text)

schema_to_xmlschema( schema name, nulls boolean, tableforest boolean, targetns text)

schema_to_xml_and_xmlschema( schema name, nulls boolean, tableforest boolean, targetns text)

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text)

table_to_xmlschema( tbl regclass, nulls boolean, tableforest boolean, targetns text)

table_to_xml_and_xmlschema( tbl regclass, nulls boolean, tableforest boolean, targetns text)

xmlagg(xml)

xmlconcat(xml[, …])

xmlelement(name name [, xmlattributes(value [AS attname] [, … ])] [, content, …])

xmlexists(text, xml)

xmlforest(content [AS name] [, …])

xml_is_well_formed(text)

xml_is_well_formed_document(text)

xml_is_well_formed_content(text)

xmlparse ( { DOCUMENT | CONTENT } value)

xpath(text, xml)

xpath(text, xml, text[])

xpath_exists(text, xml)

xpath_exists(text, xml, text[])

xmlpi(name target [, content])

xmlroot(xml, version text | no value [, standalone yes|no|no value])

xmlserialize ( { DOCUMENT | CONTENT } value AS type )

xml(text)

text(xml)

xmlcomment(xml)

xmlconcat2(xml, xml)

 

Window Functions

The following built-in window functions are Greenplum extensions to the PostgreSQL database. All window functions are immutable. For more information about window functions, see Window Expressions.

Table 4. Window functions
FunctionReturn TypeFull SyntaxDescription
cume_dist()double precisionCUME_DIST() OVER ( [PARTITION BY expr ] ORDER BY expr )Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.
dense_rank()bigintDENSE_RANK () OVER ( [PARTITION BY expr ] ORDER BY expr )Computes the rank of a row in an ordered group of rows without skipping rank values. Rows with equal values are given the same rank value.
first_value(expr)same as input expr typeFIRST_VALUE( expr ) OVER ( [PARTITION BY expr ] ORDER BY expr [ROWS|RANGE frame_expr ] )Returns the first value in an ordered set of values.
lag(expr [,offset] [,default])same as input expr typeLAG( expr [, offset ] [, default ]) OVER ( [PARTITION BY expr ] ORDER BY expr )Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. The default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
last_value(expr)same as input expr typeLAST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr [ROWS|RANGE frame_expr] )Returns the last value in an ordered set of values.
lead(expr [,offset] [,default])same as input expr typeLEAD(expr [,offset] [,exprdefault]) OVER ( [PARTITION BY expr] ORDER BY expr )Provides access to more than one row of the same table without doing a self join. Given a series of rows returned from a query and a position of the cursor, lead provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.
ntile(expr)bigintNTILE(expr) OVER ( [PARTITION BY expr] ORDER BY expr )Divides an ordered data set into a number of buckets (as defined by expr) and assigns a bucket number to each row.
percent_rank()double precisionPERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY expr )Calculates the rank of a hypothetical row R minus 1, divided by 1 less than the number of rows being evaluated (within a window partition).
rank()bigintRANK () OVER ( [PARTITION BY expr] ORDER BY expr )Calculates the rank of a row in an ordered group of values. Rows with equal values for the ranking criteria receive the same rank. The number of tied rows are added to the rank number to calculate the next rank value. Ranks may not be consecutive numbers in this case.
row_number()bigintROW_NUMBER () OVER ( [PARTITION BY expr] ORDER BY expr )Assigns a unique number to each row to which it is applied (either each row in a window partition or each row of the query).

Advanced Aggregate Functions

The following built-in advanced aggregate functions are Greenplum extensions of the PostgreSQL database. These functions are immutable.

Note: The Greenplum MADlib Extension for Analytics provides additional advanced functions to perform statistical analysis and machine learning with Greenplum Database data. See Greenplum MADlib Extension for Analytics in the Greenplum Database Reference Guide.

Table 5. Advanced Aggregate Functions
FunctionReturn TypeFull SyntaxDescription
MEDIAN (expr)timestamp, timestamptz, interval, floatMEDIAN (expression)

Example:

  1. SELECT departmzent_id, MEDIAN(salary)
  2. FROM employees
  3. GROUP BY department_id;
Can take a two-dimensional array as input. Treats such arrays as matrices.
sum(array[])smallint[], int[], bigint[], float[]sum(array[[1,2],[3,4]])

Example:

  1. CREATE TABLE mymatrix (myvalue int[]);
  2. INSERT INTO mymatrix
  3. VALUES (array[[1,2],[3,4]]);
  4. INSERT INTO mymatrix
  5. VALUES (array[[0,1],[1,0]]);
  6. SELECT sum(myvalue) FROM mymatrix;
  7. sum
  8. ———————-
  9. {{1,3},{4,4}}
Performs matrix summation. Can take as input a two-dimensional array that is treated as a matrix.
pivot_sum (label[], label, expr)int[], bigint[], float[]pivot_sum( array[‘A1’,’A2’], attr, value)A pivot aggregation using sum to resolve duplicate entries.
unnest (array[])set of anyelementunnest( array[‘one’, ‘row’, ‘per’, ‘item’])Transforms a one dimensional array into rows. Returns a set of anyelement, a polymorphic pseudo-type in PostgreSQL.

Need help? Visit the Greenplum Database Community