validate.tree {Design} | R Documentation |
Uses xval
-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' Dxy
rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score.
This function is a modification of cv.tree
which should be
consulted for details. There are print
and plot
methods for
objects created by validate.tree
.
# f <- tree(formula=y ~ x1 + x2 + ...) # or rpart ## S3 method for class 'tree': validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE, k, rand, xval=10, FUN, ...) ## S3 method for class 'rpart': validate(fit, ...) ## S3 method for class 'validate.tree': print(x, ...) ## S3 method for class 'validate.tree': plot(x, what=c("mse","dxy"), legendloc=locator, ...)
fit |
an object created by tree or rpart or having the same
attributes as one created by tree . If it was created by
rpart you must have specified the model=TRUE argument to
rpart .
|
method,B,bw,rule,type,sls,aics |
are there only for consistency
with the generic validate function; these are ignored |
x |
the result of validate.tree |
k |
a sequence of cost/complexity values. By default these are obtained
from calling FUN with no optional arguments (if tree ) or
from the rpart cptable object in the original fit object.
You may also specify a scalar or vector.
|
rand |
see cv.tree
|
xval |
number of splits |
FUN |
the name of a function which produces a sequence of trees, such
as prune.tree or shrink.tree or prune.rpart . Default is
prune.tree for fits from tree and prune.rpart for fits from rpart .
|
... |
additional arguments to FUN (ignored by print,plot ). For
validate.rpart , ... can be the same arguments used in
validate.tree .
|
pr |
set to FALSE to prevent intermediate results for each k to be printed
|
what |
a vector of things to plot. By default, 2 plots will be done, one for
mse and one for Dxy .
|
legendloc |
a function that is evaluated with a single argument equal to 1 to
generate a list with components x, y specifying coordinates of the
upper left corner of a legend, or a 2-vector. For the latter,
legendloc specifies the relative fraction of the plot at which to
center the legend.
|
a list of class "validate.tree"
with components named k, size, dxy.app
,
dxy.val, mse.app, mse.val, binary, xval
. size
is the number of nodes,
dxy
refers to Somers' D
, mse
refers to mean squared error of prediction,
app
means apparent accuracy on training samples, val
means validated
accuracy on test samples, binary
is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary). size
will not be present if the user specifies k
.
prints if pr=TRUE
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
rpart
, somers2
,
rcorr.cens
, cv.tree
,
locator
, legend
## Not run: n <- 100 set.seed(1) x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) y <- 1*(x1+x2+rnorm(n) > 1) table(y) library(rpart) f <- rpart(y ~ x1 + x2 + x3, model=TRUE) v <- validate(f) v # note the poor validation par(mfrow=c(1,2)) plot(v, legendloc=c(.2,.5)) par(mfrow=c(1,1)) ## End(Not run)