The following example follows the tutorial presented in Phillips et al. (2017) FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. available online at https://journal.sjdm.org/17/17217/jdm17217.pdf
You can install FFTrees from CRAN using install.packages()
(you only need to do this once)
# Install the package from CRAN
install.packages("FFTrees")
To use the package, you first need to load it into your current R session. You can load the package using library()
# Load the package
library(FFTrees)
The package contains several guides (like this one). To open the main guide, run FFTrees.guide()
# Open the main package guide
FFTrees.guide()
In this example, we will create FFTs from a heart disease data set. The training data are in an object called heart.train
, and the testing data are in an object called heart.test
. For these data, we will predict diagnosis
, a binary criterion that indicates whether each patent has or does not have heart disease (i.e., is at high-risk or low-risk).
To create the FFTrees
object, we’ll use the function FFTrees()
with two main arguments: formula
, a formula indicating the binary criterion as a function of one or more predictors to be considered for the tree (the shorthand formula = diagnosis ~ .
means to include all predictors), and data
, the training data.
# Create an FFTrees object
<- FFTrees(formula = diagnosis ~ ., # Criterion and (all) predictors
heart.fft data = heart.train, # Training data
data.test = heart.test, # Testing data
main = "Heart Disease", # General label
decision.labels = c("Low-Risk", "High-Risk")) # Labels for decisions
The resulting trees, decisions, and accuracy statistics are now stored in the FFTrees
object called heart.fft
.
algorithm
: There are two different algorithms available to build FFTs “ifan” (Phillips et al. 2017) and “dfan”(Phillips et al. 2017). (“max” (Martignon, Katsikopoulos, and Woike 2008), and “zigzag” (Martignon, Katsikopoulos, and Woike 2008) are no longer supported)max.levels
: Changes the maximum number of levels allowed in the tree.The following arguments apply to the “ifan” and “dfan” algorithms only:
goal.chase
: The goal.chase
argument changes which statistic is maximized during tree construction (for the “ifan” and “dfan” algorithms only). Possible arguments include “acc,” “bacc,” “wacc,” “dprime,” and “cost.” The default is “wacc” with a sensitivity weight of 0.50 (which is identical to “bacc”)goal
: The goal
argument changes which statistic is maximized when selecting trees after construction (for the “ifan” and “dfan” algorithms only). Possible arguments include “acc,” “bacc,” “wacc,” “dprime,” and “cost.”my.tree
: You can define a tree verbally as a sentence using the my.tree
argument. See Defining an FFT verbally for examples.Now we can inspect and summarize the trees. We will start by printing the object to return basic information to the console:
# Print the object heart.fft
## Heart Disease
## FFTrees
## - Trees: 7 fast-and-frugal trees predicting diagnosis
## - Outcome costs: [hi = 0, mi = 1, fa = 1, cr = 0]
##
## FFT #1: Definition
## [1] If thal = {rd,fd}, decide High-Risk.
## [2] If cp != {a}, decide Low-Risk.
## [3] If ca <= 0, decide Low-Risk, otherwise, decide High-Risk.
##
## FFT #1: Prediction Accuracy
## Prediction Data: N = 153, Pos (+) = 73 (48%)
##
## | | True + | True - |
## |---------|--------|--------|
## |Decide + | hi 64 | fa 19 | 83
## |Decide - | mi 9 | cr 61 | 70
## |---------|--------|--------|
## 73 80 N = 153
##
## acc = 81.7% ppv = 77.1% npv = 87.1%
## bacc = 82.0% sens = 87.7% spec = 76.2%
## E(cost) = 0.183
##
## FFT #1: Prediction Speed and Frugality
## mcu = 1.73, pci = 0.87
The output tells us several pieces of information:
wacc
with a sensitivity weight of 0.5 is selected as the best tree.thal
, cp
, and ca
.All statistics can be derived from a 2 x 2 confusion table like the one below. For definitions of all accuracy statistics, look at the accuracy statistic definitions vignette.
To visualize a tree, use plot()
:
# Plot the best FFT when applied to the test data
plot(heart.fft, # An FFTrees object
data = "test") # Which data to plot? "train" or "test"
tree
: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT #1), just specify another tree as an integer (e.g.; plot(heart.fft, tree = 2)
).data
: For which dataset should statistics be shown? Either data = "train"
(the default), or data = "test"
stats
: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument stats = FALSE
# Plot only the tree without accuracy statistics
plot(heart.fft,
stats = FALSE)
comp
: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument comp = FALSE
what
: To show individual cue accuracies in ROC space, include the argument what = "cues"
:
# Show marginal cue accuracies in ROC space
plot(heart.fft,
what = "cues")
An FFTrees object contains many different outputs, to see them all, run names()
# Show the names of all of the outputs in heart.fft
names(heart.fft)
## [1] "criterion_name" "cue_names" "formula" "trees"
## [5] "data" "params" "competition" "cues"
To predict classifications for a new dataset, use the standard predict()
function. For example, here’s how to predict the classifications for data in the heartdisease
object (which actually is just a combination of heart.train
and heart.test
)
# Predict classifications for a new dataset
predict(heart.fft,
data = heartdisease)
If you want to define a specific FFT and apply that tree to data, you can define it using the my.tree
argument.
# Create an FFT manuly
<- FFTrees(formula = diagnosis ~.,
my.heart.fft data = heart.train,
data.test = heart.test,
main = "Custom Heart FFT",
my.tree = "If chol > 350, predict True.
If cp != {a}, predict False.
If age <= 35, predict False. Otherwise, predict True")
Here is the result (It’s actually not too bad, although the first node is pretty worthless)
plot(my.heart.fft)