validate {Design} | R Documentation |
The validate
function when used on an object created by one of the
Design
series does resampling validation of a
regression model, with or without backward step-down variable deletion.
# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE) validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, pr=FALSE, ...)
fit |
a fit derived by e.g. lrm , cph , psm , ols . The options x=TRUE and y=TRUE
must have been specified.
|
method |
may be "crossvalidation" , "boot" (the default), ".632" , or
"randomization" .
See predab.resample for details. Can abbreviate, e.g.
"cross", "b", ".6" .
|
B |
number of repetitions. For method="crossvalidation" , is the
number of groups of omitted observations.
|
bw |
TRUE to do fast step-down using the fastbw function,
for both the overall model and for each repetition. fastbw
keeps parameters together that represent the same factor.
|
rule |
Applies if bw=TRUE . "aic" to use Akaike's information criterion as a
stopping rule (i.e., a factor is deleted if the chi-square falls below
twice its degrees of freedom), or "p" to use P-values.
|
type |
"residual" or "individual" - stopping rule is for individual factors or
for the residual chi-square for all variables deleted
|
sls |
significance level for a factor to be kept in a model, or for judging the residual chi-square. |
aics |
cutoff on AIC when rule="aic" .
|
pr |
TRUE to print results of each repetition
|
... |
parameters for each specific validate function, and parameters to
pass to predab.resample (note especially the group ,
cluster , amd subset parameters).
For psm , you can pass the maxiter parameter here (passed to
survreg.control , default is 15 iterations) as well as a tol parameter
for judging matrix singularity in solvet (default is 1e-12)
and a rel.tolerance parameter that is passed to
survreg.control (default is 1e-5).
|
It provides bias-corrected indexes that are specific to each type
of model. For validate.cph
and validate.psm
, see validate.lrm
,
which is similar.
For validate.cph
and validate.psm
, there is
an extra argument dxy
, which if TRUE
causes the rcorr.cens
function to be invoked to compute the Somers' Dxy rank correlation
to be computed at each resample (this takes a bit longer than
the likelihood based statistics). The values corresponting to the row
Dxy are equal to 2 * (C - 0.5) where C is the
C-index or concordance probability.
For validate.cph
with dxy=TRUE
,
you must specify an argument u
if the model is stratified, since
survival curves can then cross and X beta is not 1-1 with
predicted survival.
There is also validate
method for
tree
, which only does cross-validation and which has a different
list of arguments.
a matrix with rows corresponding to the statistical indexes and columns for columns for the original index, resample estimates, indexes applied to the whole or omitted sample using the model derived from the resample, average optimism, corrected index, and number of successful re-samples.
prints a summary, and optionally statistics for each re-fit
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
validate.ols
, validate.cph
, validate.lrm
, validate.tree
,
predab.resample
, fastbw
, Design
, Design.trans
, calibrate
# See examples for validate.cph, validate.lrm, validate.ols # Example of validating a parametric survival model: n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, TRUE)) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" S <- Surv(dt,e) f <- psm(S ~ age*sex, x=TRUE, y=TRUE) # Weibull model # Validate full model fit validate(f, B=10) # usually B=150 # Validate stepwise model with typical (not so good) stopping rule # bw=TRUE does not preserve hierarchy of terms at present validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")