validate.lrm {Design} | R Documentation |
The validate
function when used on an object created by lrm
does resampling validation of a logistic
regression model, with or without backward step-down variable deletion.
It provides bias-corrected Somers' D_{xy} rank correlation,
R-squared index, the intercept and slope of an overall logistic
calibration equation, the maximum absolute difference in predicted and
calibrated probabilities E_{max}, the discrimination index D
(model L.R. (chi-square - 1)/n), the unreliability
index U =
difference in -2 log likelihood between un-calibrated X
beta and X beta with overall intercept and slope
calibrated to test sample / n,
the overall quality index (logarithmic probability score) Q = D - U,
and the Brier or quadratic probability score, B (the last 3 are not
computed for ordinal models). The
corrected slope can be thought of as shrinkage factor that takes
into account overfitting.
# fit <- lrm(formula=response ~ terms, x=TRUE, y=TRUE) ## S3 method for class 'lrm': validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, pr=FALSE, kint, Dxy.method=if(k==1) 'somers2' else 'lrm', emax.lim=c(0,1), ...)
fit |
a fit derived by lrm . The options x=TRUE and y=TRUE
must have been specified.
|
method |
|
B |
|
bw |
|
rule |
|
type |
|
sls |
|
aics |
|
pr |
see validate and predab.resample |
kint |
In the case of an ordinal model, specify which intercept to validate. Default is the middle intercept. |
Dxy.method |
"lrm" to use lrm s computation of D_{xy} correlation,
which rounds
predicted probabilities to nearest .002. Use Dxy.method="somers2" (the
default) to instead use the more accurate but slower somers2 function. This
will matter most when the model is extremely predictive.
The default is "lrm" for ordinal models, since somers2 only handles
binary response variables.
|
emax.lim |
range of predicted probabilities over which to compute the maximum error. Default is entire range. |
... |
other arguments to pass to lrm.fit (now only maxit and tol are
allowed) and to predab.resample (note especially the group ,
cluster , and subset parameters)
|
If the original fit was created using penalized maximum likelihood estimation,
the same penalty.matrix
used with the original
fit are used during validation.
a matrix with rows corresponding to D_{xy},
R^2, Intercept
, Slope
, E_{max}, D,
U, Q, amd B, and
columns for the original index, resample estimates, indexes applied to
the whole or omitted sample using the model derived from the resample,
average optimism, corrected index, and number of successful re-samples.
For ordinal models, U, Q, B to not appear.
prints a summary, and optionally statistics for each re-fit
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
Miller ME, Hui SL, Tierney WM (1991): Validation techniques for logistic regression models. Stat in Med 10:1213–1226.
Harrell FE, Lee KL (1985): A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, ed. PK Sen. New York: North-Holland, p. 333–343.
predab.resample
, fastbw
, lrm
, Design
, Design.trans
, calibrate
,
somers2
, cr.setup
n <- 1000 # define sample size age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) f <- lrm(y ~ sex*rcs(cholesterol)+pol(age,2)+blood.pressure, x=TRUE, y=TRUE) #Validate full model fit validate(f, B=10) # normally B=150 validate(f, B=10, group=y) # two-sample validation: make resamples have same numbers of # successes and failures as original sample #Validate stepwise model with typical (not so good) stopping rule validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual") ## Not run: #Fit a continuation ratio model and validate it for the predicted #probability that y=0 u <- cr.setup(y) Y <- u$y cohort <- u$cohort attach(mydataframe[u$subs,]) f <- lrm(Y ~ cohort+rcs(age,4)*sex, penalty=list(interaction=2)) validate(f, cluster=u$subs, subset=cohort=='all') #see predab.resample for cluster and subset ## End(Not run)