{mlr3spatial} adds [mlr3::DataBackend]s for spatial classes ([terra::SpatRaster], [raster::brick], [stars::stars]). The package is capable of making predictions to objects of these classes via predict_raster()
. The return is a spatial object in the class specified in argument format
.
Essentially, {mlr3spatial} takes of the burden of converting spatial objects into a plain data.table
and then coercing the predicted values back into the spatial object while making sure to not loose the spatial reference.
There are some more goodies in the bag though:
predict_raster()
can be executed in chunks (argument chunksize
), making it possible to execute predictions on large raster files on consumer grade machines. This chunked execution comes with some overhead compared to execution in one block since the prediction heuristic is executed multiple times and the results need to be merged in the end.In the following, we showcase a step-by-step example how to handle a multi-layer raster object from package {stars}.
library("mlr3")
library("mlr3spatial")
First, the TIFF files is read via stars::read_stars()
and put into a DataBackendRaster
. The DataBackend is then used to create a regression task with the response being band1
.
system.file("tif/L7_ETMs.tif", package = "stars")
tif = stars::read_stars(tif)
stack =
as_data_backend(stack)
backend = as_task_regr(backend, target = "band1")
task =
print(task)
#> <TaskRegr:backend> (122848 x 6)
#> * Target: band1
#> * Properties: -
#> * Features (5):
#> - dbl (5): band2, band3, band4, band5, band6
For large raster files with millions of values it helps to predict in parallel. To enable this, set learner$parallel_predict = TRUE
and initiate a parallel plan via {future}. Since this is only an example, parallelization is not enabled here. Here we will use a simple regression tree as an example learner. In practice you might want to use a different learner - you can find an overview of available learners here.
lrn("regr.rpart")
learner =set.seed(42)
sample(1:task$nrow, 500)
row_ids =$train(task, row_ids = row_ids)
learner#> INFO [19:35:32.382] 253
#> INFO [19:35:32.431] 253
print(learner)
#> <LearnerRegrRpart:regr.rpart>: Regression Tree
#> * Model: rpart
#> * Parameters: xval=0
#> * Packages: mlr3, rpart
#> * Predict Type: response
#> * Feature types: logical, integer, numeric, factor, ordered
#> * Properties: importance, missings, selected_features, weights
For prediction predict_spatial()
is used. It will return a raster file which contains the predictions. Users can select which R spatial format the returned raster should have.
In the following, we will compare the way to conduct the prediction using {mlr3spatial} with the “native” way of fitting an e1071::svm()
model and predicting with terra::predict()
.
predict_spatial(task, learner, format = "stars")
ras =#> INFO [19:35:32.729] Start raster prediction
#> INFO [19:35:32.734] Prediction is executed with a chunksize of 200, 1 chunk(s) in total, 122848 values per chunk
#> INFO [19:35:32.791] 220
#> INFO [19:35:32.989] 219
#> INFO [19:35:32.995] Chunk 1 of 1 finished
#> INFO [19:35:33.006] Finished raster prediction in 0 seconds
names(ras) = "cadmium"
print(ras)
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> cadmium 62.3629 70.30233 77.01695 79.05135 89.2809 118.1429
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 349 288776 28.5 SIRGAS 2000 / UTM zone 25S FALSE NULL [x]
#> y 1 352 9120761 -28.5 SIRGAS 2000 / UTM zone 25S FALSE NULL [y]
Since the layers are merged in a {stars} object, one first need to split them up and convert them into a regular data.table. Next, the column names need to be adjusted to match the ones of the training data. Afterwards, the data.frame
generic of predict()
can be called. Finally, the predictions need to be injected into a stars object again.
(All of these steps are happening internally in {mlr3spatial}).
rpart::rpart(band1 ~ ., data = task$data(rows = row_ids))
rpart_learner =#> INFO [19:35:33.171] 216
as.data.table(split(stack, "band"))
stars_stack =c("x", "y", "X1")] = NULL
stars_stack[, colnames(stars_stack) = task$feature_names
predict(rpart_learner, stars_stack)
stars_pred =
# subset stars object to one band only
stack[, , , 1]
stars_pred_ras =# rename the layer name
names(stars_pred_ras) = "pred"
# assign predictions
$pred = stars_pred
stars_pred_ras
print(stars_pred_ras)
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> pred 62.3629 70.30233 77.01695 79.05135 89.2809 118.1429
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 349 288776 28.5 SIRGAS 2000 / UTM zone 25S FALSE NULL [x]
#> y 1 352 9120761 -28.5 SIRGAS 2000 / UTM zone 25S FALSE NULL [y]
#> band 1 1 NA NA NA NA NULL
Now that we have executed two predictions, we would like to verify that these are actually identical.
all.equal(as.numeric(stars_pred_ras$pred), as.numeric(ras$cadmium))
#> [1] TRUE
Finally we can plot the predictions. The color vector is extract from the viridis color palette via dput(viridis::viridis_pal()(5))
.
plot(ras, col = c("#440154FF", "#443A83FF", "#31688EFF", "#21908CFF", "#35B779FF", "#8FD744FF", "#FDE725FF"))