R Data Science Library Package

R packages are modules that contain R functions and data sets. Greenplum Database provides a collection of data science-related R libraries that can be used with the Greenplum Database PL/R language. You can download these libraries in .gppkg format from Tanzu Network.

This chapter contains the following information:

For information about the Greenplum Database PL/R Language, see Greenplum PL/R Language Extension.

Parent topic: Installing Optional Extensions

R Data Science Libraries

Libraries provided in the R Data Science package include:

  • abind

  • adabag

  • arm

  • assertthat

  • BH

  • bitops

  • car

  • caret

  • caTools

  • coda

  • colorspace

  • compHclust

  • curl

  • data.table

  • DBI

  • dichromat

  • digest

  • dplyr

  • e1071

  • flashClust

  • forecast

  • foreign

  • gdata

  • ggplot2

  • glmnet

  • gplots

  • gtable

  • gtools

  • hms

  • hybridHclust

  • igraph

  • labeling

  • lattice

  • lazyeval

  • lme4

  • lmtest

  • magrittr

  • MASS

  • Matrix

  • MCMCpack

  • minqa

  • MTS

  • munsell

  • neuralnet

  • nloptr

  • nnet

  • pbkrtest

  • plyr

  • quantreg

  • R2jags

  • R6

  • randomForest

  • RColorBrewer

  • Rcpp

  • RcppEigen

  • readr

  • reshape2

  • rjags

  • RobustRankAggreg

  • ROCR

  • rpart

  • RPostgreSQL

  • sandwich

  • scales

  • SparseM

  • stringi

  • stringr

  • survival

  • tibble

  • tseries

  • zoo

Installing the R Data Science Library Package

Before you install the R Data Science Library package, make sure that your Greenplum Database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.

  1. Locate the R Data Science library package that you built or downloaded.

    The file name format of the package is DataScienceR-<version>-relhel<N>-x86_64.gppkg.

  2. Copy the package to the Greenplum Database master host.

  3. Follow the instructions in Verifying the Greenplum Database Software Download to verify the integrity of the Greenplum Procedural Languages R Data Science Package software.

  4. Use the gppkg command to install the package. For example:

    1. $ gppkg -i DataScienceR-<version>-relhel<N>-x86_64.gppkg

    gppkg installs the R Data Science libraries on all nodes in your Greenplum Database cluster. The command also sets the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file.

  5. Restart Greenplum Database. You must re-source greenplum_path.sh before restarting your Greenplum cluster:

    1. $ source /usr/local/greenplum-db/greenplum_path.sh
    2. $ gpstop -r

The Greenplum Database R Data Science Modules are installed in the following directory:

  1. $GPHOME/ext/DataScienceR/library

Note: rjags libraries are installed in the $GPHOME/ext/DataScienceR/extlib/lib directory. If you want to use rjags and your $GPHOME is not /usr/local/greenplum-db, you must perform additional configuration steps to create a symbolic link from $GPHOME to /usr/local/greenplum-db on each node in your Greenplum Database cluster. For example:

  1. $ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
  2. $ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'

Uninstalling the R Data Science Library Package

Use the gppkg utility to uninstall the R Data Science Library package. You must include the version number in the package name you provide to gppkg.

To determine your R Data Science Library package version number and remove this package:

  1. $ gppkg -q --all | grep DataScienceR
  2. DataScienceR-<version>
  3. $ gppkg -r DataScienceR-<version>

The command removes the R Data Science libraries from your Greenplum Database cluster. It also removes the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file to their pre-installation values.

Re-source greenplum_path.sh and restart Greenplum Database after you remove the R Data Science Library package:

  1. $ . /usr/local/greenplum-db/greenplum_path.sh
  2. $ gpstop -r

Note: When you uninstall the R Data Science Library package from your Greenplum Database cluster, any UDFs that you have created that use R libraries installed with this package will return an error.