| validate.tree {Design} | R Documentation |
Uses xval-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' Dxy rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score.
This function is a modification of cv.tree which should be
consulted for details. There are print and plot methods for
objects created by validate.tree.
# f <- tree(formula=y ~ x1 + x2 + ...) # or rpart
## S3 method for class 'tree':
validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE,
k, rand, xval=10, FUN, ...)
## S3 method for class 'rpart':
validate(fit, ...)
## S3 method for class 'validate.tree':
print(x, ...)
## S3 method for class 'validate.tree':
plot(x, what=c("mse","dxy"), legendloc=locator, ...)
fit |
an object created by tree or rpart or having the same
attributes as one created by tree. If it was created by
rpart you must have specified the model=TRUE argument to
rpart.
|
method,B,bw,rule,type,sls,aics |
are there only for consistency
with the generic validate function; these are ignored |
x |
the result of validate.tree |
k |
a sequence of cost/complexity values. By default these are obtained
from calling FUN with no optional arguments (if tree) or
from the rpart cptable object in the original fit object.
You may also specify a scalar or vector.
|
rand |
see cv.tree
|
xval |
number of splits |
FUN |
the name of a function which produces a sequence of trees, such
as prune.tree or shrink.tree or prune.rpart. Default is
prune.tree for fits from tree and prune.rpart for fits from rpart.
|
... |
additional arguments to FUN (ignored by print,plot). For
validate.rpart, ... can be the same arguments used in
validate.tree.
|
pr |
set to FALSE to prevent intermediate results for each k to be printed
|
what |
a vector of things to plot. By default, 2 plots will be done, one for
mse and one for Dxy.
|
legendloc |
a function that is evaluated with a single argument equal to 1 to
generate a list with components x, y specifying coordinates of the
upper left corner of a legend, or a 2-vector. For the latter,
legendloc specifies the relative fraction of the plot at which to
center the legend.
|
a list of class "validate.tree" with components named k, size, dxy.app,
dxy.val, mse.app, mse.val, binary, xval. size is the number of nodes,
dxy refers to Somers' D, mse refers to mean squared error of prediction,
app means apparent accuracy on training samples, val means validated
accuracy on test samples, binary is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary). size will not be present if the user specifies k.
prints if pr=TRUE
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
rpart, somers2,
rcorr.cens, cv.tree,
locator, legend
## Not run: n <- 100 set.seed(1) x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) y <- 1*(x1+x2+rnorm(n) > 1) table(y) library(rpart) f <- rpart(y ~ x1 + x2 + x3, model=TRUE) v <- validate(f) v # note the poor validation par(mfrow=c(1,2)) plot(v, legendloc=c(.2,.5)) par(mfrow=c(1,1)) ## End(Not run)