There are two ways to define fast-and-frugal trees manually when using the FFTrees()
function, either as a sentence using the my.tree
argument (the easier way), or as a dataframe using the tree.definitions
argument (the harder way). Both of these methods will bypass the tree construction algorithms built into FFTrees
.
my.tree
The first method is to use the my.tree
argument, where my.tree
is a sentence describing a (single) FFT. When this argument is specified in FFTrees()
, the function (specifically wordstoFFT()
will try to extract the specified FFT from the argument.
For example, let’s look at the columns sex, age and thal in the heartdisease data:
head(heartdisease[c("sex", "age", "thal")])
## # A tibble: 6 × 3
## sex age thal
## <dbl> <dbl> <chr>
## 1 1 63 fd
## 2 1 67 normal
## 3 1 67 rd
## 4 1 37 normal
## 5 0 41 normal
## 6 1 56 normal
Here’s how we could specify an FFT using these cues as a sentence:
= "If sex = 1, predict True.
my.tree If age < 45, predict False.
If thal = {fd, normal}, predict True. Otherwise, predict False"
Here are some notes on specifying trees manually:
If <CUE> <DIRECTION> <THRESHOLD>, predict <EXIT>
.sex = {male}
. For factors with sets of values, values within a threshold should be separated by commas like eyecolor = {blue,brown}
=
, !=
, <
, >=
(etc.) are valid. For numeric cues, only use >
, >=
, <
, <=
. For factors, only use =
and !=
.True
, while negative exits are specified by False
. The final node will be forced to have a bidirectional exit. The text Otherwise, predict EXIT
I’ve included in the example above is actually not necessary.Now, let’s pass the my.tree
argument to FFTrees()
to force apply our FFT to the heartdisease data:
# Pass a verbally defined FFT to FFTrees with the my.tree argument
<- FFTrees(diagnosis ~.,
my.heart.fft data = heartdisease,
my.tree = "If sex = 1, predict True.
If age < 45, predict False.
If thal = {fd, normal}, predict True.
Otherwise, predict False")
Let’s see how well our FFT did:
# Plot
plot(my.heart.fft)
As you can see, this FFT is pretty terrible – it has a high sensitivity, but a terrible specificity.
Let’s see if we can come up with a better one using the cues thal
, cp
, and ca
# Specify an FFt verbally with the my.tree argument
<- FFTrees(diagnosis ~.,
my.heart.fft data = heartdisease,
my.tree = "If thal = {rd,fd}, predict True.
If cp != {a}, predict False.
If ca > 1, predict True.
Otherwise, predict False")
# Plot
plot(my.heart.fft)
This one looks much better!