bj {Design}R Documentation

Buckley-James Multiple Regression Model

Description

bj fits the Buckley-James distribution-free least squares multiple regression model to a possibly right-censored response variable. This model reduces to ordinary least squares if there is no censoring. By default, model fitting is done after taking logs of the response variable. bj uses the Design class for automatic anova, fastbw, validate, Function, nomogram, summary, plot, bootcov, and other functions. The bootcov function may be worth using with bj fits, as the properties of the Buckley-James covariance matrix estimator are not fully known for strange censoring patterns.

The residuals.bj function exists mainly to compute residuals and to censor them (i.e., return them as Surv objects) just as the original failure time variable was censored. These residuals are useful for checking to see if the model also satisfies certain distributional assumptions. To get these residuals, the fit must have specified y=TRUE.

The bjplot function is a special plotting function for objects created by bj with x=TRUE, y=TRUE in effect. It produces three scatterplots for every covariate in the model: the first plots the original situation, where censored data are distingushed from non-censored data by a different plotting symbol. In the second plot, called a renovated plot, vertical lines show how censored data were changed by the procedure, and the third is equal to the second, but without vertical lines. Imputed data are again distinguished from the non-censored by a different symbol.

The validate method for bj validates the Somers' Dxy rank correlation between predicted and observed responses, accounting for censoring.

The primary fitting function for bj is bj.fit, which does not allow missing data and expects a full design matrix as input.

Usage

bj(formula=formula(data), data, subset, na.action=na.delete,
   link="log", control, method='fit', x=FALSE, y=FALSE, 
   time.inc)

## S3 method for class 'bj':
print(x, digits=4, long=FALSE, ...)

## S3 method for class 'bj':
residuals(object, type=c("censored","censored.normalized"),...)

bjplot(fit, which=1:dim(X)[[2]])

## S3 method for class 'bj':
validate(fit, method="boot", B=40,
         bw=FALSE,rule="aic",type="residual",sls=.05,aics=0,pr=FALSE,
                 dxy=TRUE, tol=1e-7, rel.tolerance=1e-3, maxiter=15, ...)

bj.fit(x, y, control)

Arguments

formula an S statistical model formula. Interactions up to third order are supported. The left hand side must be a Surv object.
data
subset
na.action the usual statistical model fitting arguments
fit a fit created by bj, required for all functions except bj.
x a design matrix with or without a first column of ones, to pass to bj.fit. All models will have an intercept. For print.bj is a result of bj. For bj, set x=TRUE to include the design matrix in the fit object.
y a Surv object to pass to bj.fit as the two-column response variable. Only right censoring is allowed, and there need not be any censoring. For bj, set y to TRUE to include the two-column response matrix, with the event/censoring indicator in the second column. The first column will be transformed according to link, and depending on na.action, rows with missing data in the predictors or the response will be deleted.
link set to, for example, "log" (the default) to model the log of the response, or "identity" to model the untransformed response.
control a list containing any or all of the following components: iter.max (maximum number of iterations allowed, default is 20), eps (convergence criterion: concergence is assumed when the ratio of sum of squared errors from one iteration to the next is between 1-eps and 1+eps), trace (set to TRUE to monitor iterations), tol (matrix singularity criterion, default is 1e-7), and 'max.cycle' (in case of nonconvergence the program looks for a cycle that repeats itself, default is 30).
method set to "model.frame" or "model.matrix" to return one of those objects rather than the model fit.
dxy set to FALSE to prevent Somers' D_{xy} from being computed by validate (saves time for very large datasets)
time.inc setting for default time spacing. Default is 30 if time variable has units="Day", 1 otherwise, unless maximum follow-up time < 1. Then max time/10 is used as time.inc. If time.inc is not given and max time/default time.inc is > 25, time.inc is increased.
digits number of significant digits to print if not 4.
long set to TRUE to print the correlation matrix for parameter estimates
object the result of bj
type type of residual desired. Default is censored unnormalized residuals, defined as link(Y) - linear.predictors, where the link function was usually the log function. You can specify type="censored.normalized" to divide the residuals by the estimate of sigma.
which vector of integers or character strings naming elements of the design matrix (the names of the original predictors if they entered the model linearly) for which to have bjplot make plots of only the variables listed in which (names or numbers).
B
bw
rule
sls
aics
pr
tol
rel.tolerance
maxiter see predab.resample
... ignored for print; passed through to predab.resample for validate

Details

The program implements the algorithm as described in the original article by Buckley & James. Also, we have used the original Buckley & James prescription for computing variance/covariance estimator. This is based on non-censored observations only and does not have any theoretical justification, but has been shown in simulation studies to behave well. Our experience confirms this view. Convergence is rather slow with this method, so you may want to increase the number of iterations. Our experience shows that often, in particular with high censoring, 100 iterations is not too many. Sometimes the method will not converge, but will instead enter a loop of repeating values (this is due to the discrete nature of Kaplan and Meier estimator and usually happens with small sample sizes). The program will look for such a loop and return the average betas. It will also issue a warning message and give the size of the cycle (usually less than 6).

Value

bj returns a fit object with similar information to what survreg, psm, cph would store as well as what Design stores and units and time.inc. residuals.bj returns a Surv object. One of the components of the fit object produced by bj (and bj.fit) is a vector called stats which contains the following names elements: "Obs", "Events", "d.f.","error d.f.","sigma". Here sigma is the estimate of the residual standard deviation.

Author(s)

Janez Stare
Department of Biomedical Informatics
Ljubljana University
Ljubljana, Slovenia
janez.stare@mf.uni-lj.si

Harald Heinzl
Department of Medical Computer Sciences
Vienna University
Vienna, Austria
harald.heinzl@akh-wien.ac.at

Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu

References

Buckley JJ, James IR. Linear regression with censored data. Biometrika 1979; 66:429–36.

Miller RG, Halpern J. Regression with censored data. Biometrika 1982; 69: 521–31.

James IR, Smith PJ. Consistency results for linear regression with censored data. Ann Statist 1984; 12: 590–600.

Lai TL, Ying Z. Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann Statist 1991; 19: 1370–402.

Hillis SL. Residual plots for the censored data linear regression model. Stat in Med 1995; 14: 2023–2036.

See Also

Design, psm, survreg, cph, Surv, na.delete, na.detail.response, datadist, rcorr.cens.

Examples

set.seed(1)
ftime  <- 10*rexp(200)
stroke <- ifelse(ftime > 10, 0, 1)
ftime  <- pmin(ftime, 10)
units(ftime) <- "Month"
age <- rnorm(200, 70, 10)
hospital <- factor(sample(c('a','b'),200,TRUE))
dd <- datadist(age, hospital)
options(datadist="dd")

f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, x=TRUE, y=TRUE)
# add link="identity" to use a censored normal regression model instead
# of a lognormal one
anova(f)
fastbw(f)
validate(f, B=15)
plot(f, age=NA, hospital=NA)  # needs datadist since no explicit age,hosp.
coef(f)               # look at regression coefficients
coef(psm(Surv(ftime, stroke) ~ rcs(age,5) + hospital, dist='lognormal'))
                      # compare with coefficients from likelihood-based
                      # log-normal regression model
                      # use dist='gau' not under R 

r <- resid(f, 'censored.normalized')
survplot(survfit(r), conf='none') 
                      # plot Kaplan-Meier estimate of 
                      # survival function of standardized residuals
survplot(survfit(r ~ cut2(age, g=2)), conf='none')  
                      # may desire both strata to be n(0,1)
options(datadist=NULL)

[Package Design version 2.0-12 Index]