14.4 Modeling the floristic gradient

To predict the floristic gradient spatially, we will make use of a random forest model (Hengl et al. 2018).Random forest models are frequently used in environmental and ecological modeling, and often provide the best results in terms of predictive performance (Schratz et al. 2018).Here, we shortly introduce decision trees and bagging, since they form the basis of random forests.We refer the reader to James et al. (2013) for a more detailed description of random forests and related techniques.

To introduce decision trees by example, we first construct a response-predictor matrix by joining the rotated NMDS scores to the field observations (random_points).We will also use the resulting data frame for the mlr modeling later on.

  1. # construct response-predictor matrix
  2. # id- and response variable
  3. rp = data.frame(id = as.numeric(rownames(sc)), sc = sc[, 1])
  4. # join the predictors (dem, ndvi and terrain attributes)
  5. rp = inner_join(random_points, rp, by = "id")

Decision trees split the predictor space into a number of regions.To illustrate this, we apply a decision tree to our data using the scores of the first NMDS axis as the response (sc) and altitude (dem) as the only predictor.

  1. library("tree")
  2. tree_mo = tree(sc ~ dem, data = rp)
  3. plot(tree_mo)
  4. text(tree_mo, pretty = 0)

Simple example of a decision tree with three internal nodes and four terminal nodes.
Figure 14.4: Simple example of a decision tree with three internal nodes and four terminal nodes.

The resulting tree consists of three internal nodes and four terminal nodes (Figure 14.4).The first internal node at the top of the tree assigns all observations which are below 328.5 to the left and all other observations to the right branch.The observations falling into the left branch have a mean NMDS score of -1.198.Overall, we can interpret the tree as follows: the higher the elevation, the higher the NMDS score becomes.Decision trees have a tendency to overfit, that is they mirror too closely the input data including its noise which in turn leads to bad predictive performances (Section 11.4; James et al. 2013).Bootstrap aggregation (bagging) is an ensemble technique and helps to overcome this problem.Ensemble techniques simply combine the predictions of multiple models.Thus, bagging takes repeated samples from the same input data and averages the predictions.This reduces the variance and overfitting with the result of a much better predictive accuracy compared to decision trees.Finally, random forests extend and improve bagging by decorrelating trees which is desirable since averaging the predictions of highly correlated trees shows a higher variance and thus lower reliability than averaging predictions of decorrelated trees (James et al. 2013).To achieve this, random forests use bagging, but in contrast to the traditional bagging where each tree is allowed to use all available predictors, random forests only use a random sample of all available predictors.

14.4.1 mlr building blocks

The code in this section largely follows the steps we have introduced in Section 11.5.2.The only differences are the following:

  • The response variable is numeric, hence a regression task will replace the classification task of Section 11.5.2.
  • Instead of the AUROC which can only be used for categorical response variables, we will use the root mean squared error (RMSE) as performance measure.
  • We use a random forest model instead of a support vector machine which naturally goes along with different hyperparameters.
  • We are leaving the assessment of a bias-reduced performance measure as an exercise to the reader (see Exercises).Instead we show how to tune hyperparameters for (spatial) predictions.
    Remember that 125,500 models were necessary to retrieve bias-reduced performance estimates when using 100-repeated 5-fold spatial cross-validation and a random search of 50 iterations (see Section 11.5.2).In the hyperparameter tuning level, we found the best hyperparameter combination which in turn was used in the outer performance level for predicting the test data of a specific spatial partition (see also Figure 11.6).This was done for five spatial partitions, and repeated a 100 times yielding in total 500 optimal hyperparameter combinations.Which one should we use for making spatial predictions?The answer is simple, none at all.Remember, the tuning was done to retrieve a bias-reduced performance estimate, not to do the best possible spatial prediction.For the latter, one estimates the best hyperparameter combination from the complete dataset.This means, the inner hyperparameter tuning level is no longer needed which makes perfect sense since we are applying our model to new data (unvisited field observations) for which the true outcomes are unavailable, hence testing is impossible in any case.Therefore, we tune the hyperparameters for a good spatial prediction on the complete dataset via a 5-fold spatial CV with one repetition.

The preparation for the modeling using the mlr package includes the construction of a response-predictor matrix containing only variables which should be used in the modeling and the construction of a separate coordinate data frame.

  1. # extract the coordinates into a separate data frame
  2. coords = sf::st_coordinates(rp) %>%
  3. as.data.frame() %>%
  4. rename(x = X, y = Y)
  5. # only keep response and predictors which should be used for the modeling
  6. rp = dplyr::select(rp, -id, -spri) %>%
  7. st_drop_geometry()

Having constructed the input variables, we are all set for specifying the mlr building blocks (task, learner, and resampling).We will use a regression task since the response variable is numeric.The learner is a random forest model implementation from the ranger package.

  1. # create task
  2. task = makeRegrTask(data = rp, target = "sc", coordinates = coords)
  3. # learner
  4. lrn_rf = makeLearner(cl = "regr.ranger", predict.type = "response")

As opposed to for example support vector machines (see Section 11.5.2), random forests often already show good performances when used with the default values of their hyperparameters (which may be one reason for their popularity).Still, tuning often moderately improves model results, and thus is worth the effort (Probst, Wright, and Boulesteix 2018).Since we deal with geographic data, we will again make use of spatial cross-validation to tune the hyperparameters (see Sections 11.4 and 11.5).Specifically, we will use a five-fold spatial partitioning with only one repetition (makeResampleDesc()).In each of these spatial partitions, we run 50 models (makeTuneControlRandom()) to find the optimal hyperparameter combination.

  1. # spatial partitioning
  2. perf_level = makeResampleDesc("SpCV", iters = 5)
  3. # specifying random search
  4. ctrl = makeTuneControlRandom(maxit = 50L)

In random forests, the hyperparameters mtry, min.node.size and sample.fraction determine the degree of randomness, and should be tuned (Probst, Wright, and Boulesteix 2018).mtry indicates how many predictor variables should be used in each tree.If all predictors are used, then this corresponds in fact to bagging (see beginning of Section 14.4).The sample.fraction parameter specifies the fraction of observations to be used in each tree.Smaller fractions lead to greater diversity, and thus less correlated trees which often is desirable (see above).The min.node.size parameter indicates the number of observations a terminal node should at least have (see also Figure 14.4).Naturally, as trees and computing time become larger, the lower the min.node.size.

Hyperparameter combinations will be selected randomly but should fall inside specific tuning limits (makeParamSet()).mtry should range between 1 and the number of predictors (4), sample.fraction should range between 0.2 and 0.9 and min.node.size should range between 1 and 10.

  1. # specifying the search space
  2. ps = makeParamSet(
  3. makeIntegerParam("mtry", lower = 1, upper = ncol(rp) - 1),
  4. makeNumericParam("sample.fraction", lower = 0.2, upper = 0.9),
  5. makeIntegerParam("min.node.size", lower = 1, upper = 10)
  6. )

Finally, tuneParams() runs the hyperparameter tuning, and will find the optimal hyperparameter combination for the specified parameters.The performance measure is the root mean squared error (RMSE).

  1. # hyperparamter tuning
  2. set.seed(02082018)
  3. tune = tuneParams(learner = lrn_rf,
  4. task = task,
  5. resampling = perf_level,
  6. par.set = ps,
  7. control = ctrl,
  8. measures = mlr::rmse)
  9. #>...
  10. #> [Tune-x] 49: mtry=3; sample.fraction=0.533; min.node.size=5
  11. #> [Tune-y] 49: rmse.test.rmse=0.5636692; time: 0.0 min
  12. #> [Tune-x] 50: mtry=1; sample.fraction=0.68; min.node.size=5
  13. #> [Tune-y] 50: rmse.test.rmse=0.6314249; time: 0.0 min
  14. #> [Tune] Result: mtry=4; sample.fraction=0.887; min.node.size=10 :
  15. #> rmse.test.rmse=0.5104918

An mtry of 4, a sample.fraction of 0.887, and a min.node.size of 10 represent the best hyperparameter combination.A RMSE of 0.51 is relatively good when considering the range of the response variable which is 3.04 (diff(range(rp$sc))).

14.4.2 Predictive mapping

The tuned hyperparameters can now be used for the prediction.We simply have to modify our learner using the result of the hyperparameter tuning, and run the corresponding model.

  1. # learning using the best hyperparameter combination
  2. lrn_rf = makeLearner(cl = "regr.ranger",
  3. predict.type = "response",
  4. mtry = tune$x$mtry,
  5. sample.fraction = tune$x$sample.fraction,
  6. min.node.size = tune$x$min.node.size)
  7. # doing the same more elegantly using setHyperPars()
  8. # lrn_rf = setHyperPars(makeLearner("regr.ranger", predict.type = "response"),
  9. # par.vals = tune$x)
  10. # train model
  11. model_rf = train(lrn_rf, task)
  12. # to retrieve the ranger output, run:
  13. # mlr::getLearnerModel(model_rf)
  14. # which corresponds to:
  15. # ranger(sc ~ ., data = rp,
  16. # mtry = tune$x$mtry,
  17. # sample.fraction = tune$x$sample.fraction,
  18. # min.node.sie = tune$x$min.node.size)

The last step is to apply the model to the spatially available predictors, i.e., to the raster stack.So far, raster::predict() does not support the output of ranger models, hence, we will have to program the prediction ourselves.First, we convert ep into a prediction data frame which secondly serves as input for the predict.ranger() function.Thirdly, we put the predicted values back into a RasterLayer (see Section 3.3.1 and Figure 14.5).

  1. # convert raster stack into a data frame
  2. new_data = as.data.frame(as.matrix(ep))
  3. # apply the model to the data frame
  4. pred_rf = predict(model_rf, newdata = new_data)
  5. # put the predicted values into a raster
  6. pred = dem
  7. # replace altitudinal values by rf-prediction values
  8. pred[] = pred_rf$data$response

Predictive mapping of the floristic gradient clearly revealing distinct vegetation belts.
Figure 14.5: Predictive mapping of the floristic gradient clearly revealing distinct vegetation belts.

The predictive mapping clearly reveals distinct vegetation belts (Figure 14.5).Please refer to Muenchow, Hauenstein, et al. (2013) for a detailed description of vegetation belts on lomas mountains.The blue color tones represent the so-called Tillandsia-belt.Tillandsia is a highly adapted genus especially found in high quantities at the sandy and quite desertic foot of lomas mountains.The yellow color tones refer to a herbaceous vegetation belt with a much higher plant cover compared to the Tillandsia-belt.The orange colors represent the bromeliad belt, which features the highest species richness and plant cover.It can be found directly beneath the temperature inversion (ca. 750-850 m asl) where humidity due to fog is highest.Water availability naturally decreases above the temperature inversion, and the landscape becomes desertic again with only a few succulent species (succulent belt; red colors).Interestingly, the spatial prediction clearly reveals that the bromeliad belt is interrupted which is a very interesting finding we would have not detected without the predictive mapping.