anova.Design {Design}R Documentation

Analysis of Variance (Wald and F Statistics)

Description

The anova function automatically tests most meaningful hypotheses in a design. For example, suppose that age and cholesterol are predictors, and that a general interaction is modeled using a restricted spline surface. anova prints Wald statistics (F statistics for an ols fit) for testing linearity of age, linearity of cholesterol, age effect (age + age by cholesterol interaction), cholesterol effect (cholesterol + age by cholesterol interaction), linearity of the age by cholesterol interaction (i.e., adequacy of the simple age * cholesterol 1 d.f. product), linearity of the interaction in age alone, and linearity of the interaction in cholesterol alone. Joint tests of all interaction terms in the model and all nonlinear terms in the model are also performed. For any multiple d.f. effects for continuous variables that were not modeled through rcs, pol, lsp, etc., tests of linearity will be omitted. This applies to matrix predictors produced by e.g. poly or ns. print.anova.Design is the printing method. text.anova.Design is the text method for inserting anova tables on graphs. plot.anova.Design draws dot charts depicting the importance of variables in the model, as measured by Wald chi-square, chi-square minus d.f., AIC, P-values, partial R^2, R^2 for the whole model after deleting the effects in question, or proportion of overall model R^2 that is due to each predictor. latex.anova.Design is the latex method. It substitutes Greek/math symbols in column headings, uses boldface for TOTAL lines, and constructs a caption. Then it passes the result to latex.default for conversion to LaTeX.

Usage

## S3 method for class 'Design':
anova(object, ..., main.effect=FALSE, tol=1e-9, 
      test=c('F','Chisq'), ss=TRUE)

## S3 method for class 'anova.Design':
print(x, which=c('none','subscripts','names','dots'), ...)

## S3 method for class 'anova.Design':
plot(x, 
     what=c("chisqminusdf","chisq","aic","P","partial R2","remaining R2",
            "proportion R2"), 
     xlab=NULL, pch=16, 
     rm.totals=TRUE, rm.ia=FALSE, rm.other=NULL, newnames,
     sort=c("descending","ascending","none"), pl=TRUE, ...)

## S3 method for class 'anova.Design':
text(x, at, cex=.5, font=2, ...)

## S3 method for class 'anova.Design':
latex(object, title, psmall=TRUE, 
      dec.chisq=2, dec.F=2, dec.ss=NA, dec.ms=NA, dec.P=4, ...)

Arguments

object a Design fit object. object must allow Varcov to return the variance-covariance matrix. For latex, is the result of anova.
... If omitted, all variables are tested, yielding tests for individual factors and for pooled effects. Specify a subset of the variables to obtain tests for only those factors, with a pooled Wald tests for the combined effects of all factors listed. Names may be abbreviated. For example, specify anova(fit,age,cholesterol) to get a Wald statistic for testing the joint importance of age, cholesterol, and any factor interacting with them.
Can be optional graphical parameters to send to text or dotchart2, or other parameters to send to latex.default. Ignored for print.
main.effect Set to TRUE to print the (usually meaningless) main effect tests even when the factor is involved in an interaction. The default is FALSE, to print only the effect of the main effect combined with all interactions involving that factor.
tol singularity criterion for use in matrix inversion
test For an ols fit, set test="Chisq" to use Wald chi^2 tests rather than F-tests.
ss For an ols fit, set ss=FALSE to suppress printing partial sums of squares, mean squares, and the Error SS and MS.
x for print,plot,text is the result of anova.
which If which is not "none" (the default), print.anova.Design will add to the rightmost column of the output the list of parameters being tested by the hypothesis being tested in the current row. Specifying which="subscripts" causes the subscripts of the regression coefficients being tested to be printed (with a subscript of one for the first non-intercept term). which="names" prints the names of the terms being tested, and which="dots" prints dots for terms being tested and blanks for those just being adjusted for.
at for text is a list containing the x- and y-coordinates for the upper left corner of the anova table to be drawn on an existing plot, e.g. at=locator(1)
cex character expansion size for text.anova.Design
font font for text.anova.Design. Default is 2 (usually Courier).
what what type of statistic to plot. The default is the Wald chi-square statistic for each factor (adding in the effect of higher-ordered factors containing that factor) minus its degrees of freedom. The last three choice for what only apply to ols models.
xlab x-axis label, default is constructed according to what. plotmath symbols are used for R, by default.
pch character for plotting dots in dot charts. Default is 16 (solid dot).
rm.totals set to FALSE to keep total chi-squares (overall, nonlinear, interaction totals) in the chart.
rm.ia set to TRUE to omit any effect that has "*" in its name
rm.other a list of other predictor names to omit from the chart
newnames a list of substitute predictor names to use, after omitting any.
sort default is to sort bars in descending order of the summary statistic
pl set to FALSE to suppress plotting. This is useful when you only wish to analyze the vector of statistics returned.
title title to pass to latex, default is name of fit object passed to anova prefixed with "anova.". For Windows, the default is "ano" followed by the first 5 letters of the name of the fit object.
psmall The default is psmall=TRUE, which causes P<0.00005 to print as <0.0001. Set to FALSE to print as 0.0000.
dec.chisq number of places to the right of the decimal place for typesetting chi-square values (default is 2). Use zero for integer, NA for floating point.
dec.F digits to the right for F statistics (default is 2)
dec.ss digits to the right for sums of squares (default is NA, indicating floating point)
dec.ms digits to the right for mean squares (default is NA)
dec.P digits to the right for P-values

Details

If the statistics being plotted with plot.anova.Design are few in number and one of them is negative or zero, plot.anova.Design will quit because of an error in dotchart2.

Value

anova.Design returns a matrix of class anova.Design containing factors as rows and chi-square, d.f., and P-values as columns (or d.f., partial SS, MS, F, P). plot.anova.Design invisibly returns the vector of quantities plotted. This vector has a names attribute describing the terms for which the statistics in the vector are calculated.

Side Effects

print prints, text uses tempfile to get a temporary Unix file name, sink, and unix (to remove the temporary file). latex creates a file with a name of the form "title.tex" (see the title argument above).

Author(s)

Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu

See Also

Design, Design.Misc, lrtest, Design.trans, summary.Design, solvet, text, locator, dotchart2, latex, Dotplot, anova.lm, contrast.Design

Examples

n <- 1000    # define sample size
set.seed(17) # so can reproduce the results
treat <- factor(sample(c('a','b','c'), n,TRUE))
num.diseases <- sample(0:4, n,TRUE)
age <- rnorm(n, 50, 10)
cholesterol <- rnorm(n, 200, 25)
weight <- rnorm(n, 150, 20)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age'      # label is in Hmisc
label(num.diseases) <- 'Number of Comorbid Diseases'
label(cholesterol) <- 'Total Cholesterol'
label(weight) <- 'Weight, lbs.'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl'   # uses units.default in Hmisc

# Specify population model for log odds that Y=1
L <- .1*(num.diseases-2) + .045*(age-50) +
     (log(cholesterol - 10)-5.2)*(-2*(treat=='a') +
     3.5*(treat=='b')+2*(treat=='c'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)

fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) +
               log(cholesterol+10) + treat:log(cholesterol+10))
anova(fit)                       # Test all factors
anova(fit, treat, cholesterol)   # Test these 2 by themselves
                                 # to get their pooled effects
g <- lrm(y ~ treat*rcs(age))
dd <- datadist(treat, num.diseases, age, cholesterol)
options(datadist='dd')
plot(g, age=NA, treat="b")
s <- anova(g)
print(s)
#p <- locator(1)                  # click mouse at upper left corner of table
p <- list(x=32,y=2.1)
text(s, at=p)                    # add anova table to regression plot
plot(s)                          # new plot - dot chart of chisq-d.f.
# latex(s)                       # nice printout - creates anova.g.tex
options(datdist=NULL)


# Simulate data with from a given model, and display exactly which
# hypotheses are being tested

set.seed(123)
age <- rnorm(500, 50, 15)
treat <- factor(sample(c('a','b','c'), 500,TRUE))
bp  <- rnorm(500, 120, 10)
y   <- ifelse(treat=='a', (age-50)*.05, abs(age-50)*.08) + 3*(treat=='c') +
       pmax(bp, 100)*.09 + rnorm(500)
f   <- ols(y ~ treat*lsp(age,50) + rcs(bp,4))
print(names(coef(f)), quote=FALSE)
specs(f)
anova(f)
an <- anova(f)
options(digits=3)
print(an, 'subscripts')
print(an, 'dots')

an <- anova(f, test='Chisq', ss=FALSE)
plot(0:1)                        # make some plot
text(an, at=list(x=1.5,y=.6))    # add anova table to plot
plot(an)                         # new plot - dot chart of chisq-d.f.
# latex(an)                      # nice printout - creates anova.f.tex

# Suppose that a researcher wants to make a big deal about a variable 
# because it has the highest adjusted chi-square.  We use the
# bootstrap to derive 0.95 confidence intervals for the ranks of all
# the effects in the model.  We use the plot method for anova, with
# pl=FALSE to suppress actual plotting of chi-square - d.f. for each
# bootstrap repetition.  We rank the negative of the adjusted
# chi-squares so that a rank of 1 is assigned to the highest.
# It is important to tell plot.anova.Design not to sort the results,
# or every bootstrap replication would have ranks of 1,2,3 for the stats.

mydata <- data.frame(x1=runif(200), x2=runif(200),
                     sex=factor(sample(c('female','male'),200,TRUE)))
set.seed(9)  # so can reproduce example
mydata$y <- ifelse(runif(200)<=plogis(mydata$x1-.5 + .5*(mydata$x2-.5) + 
                   .5*(mydata$sex=='male')),1,0)

if(.R.) {
library(boot)
b <- boot(mydata, function(data, i, ...) rank(-plot(anova(
                lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data,subset=i)), 
                sort='none', pl=FALSE)),
                R=25)  # should really do R=500 but will take a while
Rank <- b$t0
lim <- t(apply(b$t, 2, quantile, probs=c(.025,.975)))
} else {
b <- bootstrap(mydata, rank(-plot(anova(
                lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,mydata)), sort='none', pl=FALSE)),
               B=25)  # should really do B=500 but will take a while
Rank <- b$observed
lim <- limits.emp(b)[,c(1,4)]  # get 0.025 and 0.975 quantiles
}

# Use the Hmisc Dotplot function to display ranks and their confidence
# intervals.  Sort the categories by descending adj. chi-square, for ranks
original.chisq <- plot(anova(lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data=mydata)),
                       sort='none', pl=FALSE)
predictor <- as.factor(names(original.chisq))
predictor <- reorder.factor(predictor, -original.chisq)

Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank', 
                main=if(.R.) expression(paste(
'Ranks and 0.95 Confidence Limits for ',chi^2,' - d.f.')) else
'Ranks and 0.95 Confidence Limits for Chi-square - d.f.')

[Package Design version 2.0-12 Index]