plot.Design {Design}R Documentation

Plot Effects of Variables

Description

Plots the effect of one or two predictors on the linear predictor or X beta scale, or on some transformation of that scale. The predictor is always plotted in its original coding on the x or y-axis. perimeter is a function used to generate the boundary of data to plot when a 3-d plot is made. It finds the area where there are sufficient data to generate believable interaction fits.

Legend is a generic function for adding legends to an existing graph according to the specific plot made by plot.Design. The specific Legend method for plot.Design is Legend.plot.Design. It handles legends for image plots. For other plots with one or more curves, make legends using the label.curves parameter.

datadensity is a function for showing the data density (raw data) on each curve generated for curve-type plots. This is a rug plot showing the location/density of data values for the x-axis variable. If there was a second variable specified to plot that generated separate curves, the data density specific to each class of points is shown. This assumes that the second variable was categorical. The rug plots are drawn by scat1d.

To plot effects instead of estimates (e.g., treatment differences as a function of interacting factors) see contrast.Design and summary.Design.

Usage

perimeter(x, y, xinc=diff(range(x))/10, 
          n=10, lowess.=TRUE)

## S3 method for class 'Design':
plot(x, ..., xlim, ylim, fun, xlab, ylab, 
     conf.int=.95, conf.type=c('mean','individual'),
     add=FALSE, label.curves=TRUE,
     eye, theta=0, phi=15, perspArgs=NULL,
     lty, col=1, lwd=par('lwd'), lwd.conf=1, pch=1, 
     adj.zero=FALSE, ref.zero=FALSE, adj.subtitle, cex.adj,
     non.slopes, time=NULL, loglog=FALSE, val.lev=FALSE,
     digits=4, log="", perim,
     method=c("persp","contour","image","dotchart","default"),
     sortdot=c('neither','ascending','descending'),
     nlevels=10, name, zlim=range(zmat,na.rm=TRUE),
     vnames=c('labels','names'), abbrev=FALSE)
# Just say plot(fit, ...)

## S3 method for class 'plot.Design':
print(x, ...)

## S3 method for class 'perimeter':
lines(x, ...)

Legend(object, ...)
## S3 method for class 'plot.Design':
Legend(object, x, y, size=c(1,1), horizontal=TRUE,
       nint=50, fun, at, zlab, ...)

## S3 method for class 'plot.Design':
datadensity(object, x1, x2, ...)

Arguments

fit a fit object created with Design() in effect. options(datadist="d") must have been specified (where d was created by datadist), or it must have been in effect with fit was created.
... The first variable in this list is displayed on the x-axis. Specify x=NA to use the default display range, or any range you choose (e.g. seq(0,100,by=2),c(2,3,7,14)). The default list of values for which predictions are made is taken as the list of unique values of the variable if they number fewer than 11. For variables with >10 unique values, 100 equally spaced values in the range are used for plotting if the range is not specified. If there is a second variable listed, and its range is NA or a single value, that variable is displayed on the y-axis. If the second variable's range has fewer than 40 levels, separate curves are generated for each value of the variable. Otherwise, a three dimensional perspective plot is drawn using 40 equally-spaced values of y. Names may be abbreviated. plot senses that a variable is not to be displayed by checking if the list of values for the variable is a scalar instead of a vector. Variables not specified are set to the default adjustment value limits[2], i.e. the median for continuous variables and a reference category for non-continuous ones. Due to a bug in S, the first variable mentioned may not be named x. This would cause the general scatterplot function plot to be invoked by mistake. Variables after the first or second specified to plot define adjustment settings. For categorical variables, specify the class labels in quotes when specifying variable values. If the levels of a categorical variable are numeric, you may omit the quotes. For variables not described using datadist, you must specify explicit ranges and adjustment settings for predictors that were in the model. Note that you can omit variables entirely. In that case, all non-interaction effects will be plotted automatically as if you said plot(fit, age=NA); plot(fit, sex=NA); .... In this case you have no control over the settings of the variables for the x-axis, i.e., NA is always assumed.
For a plot made up of multiple curves, these are extra graphical arguments will be passed to key from Legend. For image plots, these arguments are passed to par and have temporary effect. For datadensity these extra arguments are passed along to scat1d.
x first variable of a pair of predictors forming a 3-d plot, to specify to perim. For Legend, is either a vector of 1 or 2 x-coordinates or a list with elements x and y each with 1 or 2 coordinates. For method="image" plots, 1 or 2 coordinates may be given, and for other plot types, 1 coordinate is given. A single coordinate represents the upper left corner of the legend box. For Legend, x and y are optional. If omitted, locator is used to position legends with the mouse. For lines.perimeter, x is the result of perimeter. For print.plot.Design, x is the result of plot.Design.
y second variable of the pair for perim, or y-coordinates for Legend. If omitted, x is assumed to be a list with both x and y components.
xinc increment in x over which to examine the density of y in perimeter
n within intervals of x for perimeter, takes the informative range of y to be the nth smallest to the nth largest values of y. If there aren't at least 2n y values in the x interval, no y ranges are used for that interval.
lowess. set to FALSE to not have lowess smooth the data perimeters
xlim This parameter is seldom used, as limits are usually controlled with the variables specifications. One reason to use xlim is to plot a factor variable on the x-axis that was created with the cut2 function with the levels.mean option, with val.lev=TRUE specified to plot.Design. In this case you may want the axis to have the range of the original variable values given to cut2 rather than the range of the means within quantile groups.
ylim Range for plotting on response variable axis. Computed by default.
fun Function used to transform X beta and its confidence interval before plotting. For example, to transform from a logit to a probability scale, use fun=function(x)1/(1+exp(-x)) or fun=plogis, and to take the anti-log, specify fun=exp. for Legend, fun is a function for transforming tick mark labels for color or gray scale legends for method="image". For example, if plot.Design is used to make an image plot of log odds ratios, specifying fun=plogis will cause the color legend to be labeled with probability values rather than log odds.
xlab Label for x-axis. Default is one given to asis, rcs, etc., which may have been the "label" attribute of the variable.
ylab Label for y-axis (z-axis if perspective plot). If fun is not given, default is "log Odds" for lrm, "log Relative Hazard" for cph, name of the response variable for ols, TRUE or log(TRUE) for psm, or "X * Beta" otherwise. If fun is given, the default is "". If time is given, the default is "(time) (units) Survival Probability" or "log[-log S(time)]" depending on the loglog parameter.
conf.int Default is .95. Specify FALSE to suppress confidence bands.
conf.type specifies the type of confidence interval. Default is for the mean. For ols fits there is the option of obtaining confidence limits for individual predicted values by specifying conf.type="individual".
add Set to TRUE to add to an existing plot without drawing new axes. Default is FALSE. See the warning note under sortdot.
label.curves Set to FALSE to suppress labeling of separate curves. Default is TRUE, which causes labcurve to be invoked to place labels at positions where the curves are most separated, labeling each curve with the full curve label. Set label.curves to a list to specify options to labcurve, e.g., label.curves= list(method="arrow", cex=.8). These option names may be abbreviated in the usual way arguments are abbreviated. Use for example label.curves=list(keys=letters[1:5]) to draw single lower case letters on 5 curves where they are most separated, and automatically position a legend in the most empty part of the plot. The col, lty, and lwd parameters are passed automatically to labcurve although they may be overridden here.
eye Argument to S persp function for defining perspective in 3-d plots. Default is (-6, -6, 9). This is for S-Plus only.
theta
phi
perspArgs a list containing other named arguments to be passed to persp
lty Vector of line types to use in plotting separate curves. Default is 1,2, ...
lwd Vector of line widths corresponding to separate curves, default is par("lwd").
lwd.conf scalar width of lines for confidence bands. Default is 1.
pch symbol to use when plotting unconnected points when a categorical variable is on the x-axis or when method="dotchart". Default is 1 (open circle). See points for other values, or use the show.pch function in Hmisc.
col S color number for displaying curves. Default is 1 (black). Specify a vector of integers to assign different colors to different curves.
adj.subtitle Set to FALSE to suppress subtitling the graph with the list of settings of non-graphed adjustment values. Default is TRUE if <= 6 non-plotted factors.
cex.adj cex parameter for size of adjustment settings in subtitles. Default is 0.75 times par("cex").
adj.zero Set to TRUE to adjust all non-plotted variables to 0 (or reference cell for categorical variables) and to omit intercept(s) from consideration. Default is FALSE.
ref.zero Subtract a constant from X beta before plotting so that the reference value of the x-variable yields y=0. This is done before applying function fun.
non.slopes This is only useful in a multiple intercept model such as the ordinal logistic model. There to use to second of three intercepts, for example, specify non.slopes=c(0,1,0). The default is non.slopes=rep(0,k) if adj.zero=TRUE, where k is the number of intercepts in the model. If adj.zero=FALSE, the default is (1,0,0,...,0).
time Specify a single time u to cause function survest to be invoked to plot the probability of surviving until time u when the fit is from cph or psm.
loglog Specify loglog=TRUE to plot log[-log(survival)] instead of survival, when time is given.
val.lev When plotting a categorical or strata factor with category labels that are strings of legal numeric values, set to TRUE to use these values in plotting. An ordinary axis with uniform spacing will be used rather than spacing dictated by the value labels. When val.lev=FALSE, category labels dictate how axis tick marks are made. val.lev is used typically when the variable being plotted is a categorical variable that was collapsed into intervals, with the value label for a category representing interval means or midpoints. Such variables are created for example by the cut2 function, specifying levels.mean=TRUE. For plotting a discrete numeric variable you can specify val.lev=TRUE to force plotting of the variable as if it were continuous.
digits Controls how ``adjust-to'' values are plotted. The default is 4 significant digits.
log Set log="x", "y" or "xy" to plot log scales on one or both axes.
perim names a matrix created by perimeter when used for 3-d plots of two continuous predictors. When the combination of variables is outside the range in perim, that section of the plot is suppressed. If perim is omitted, 3-d plotting will use the marginal distributions of the two predictors to determine the plotting region, when the grid is not specified explicitly in variables. When instead a series of curves is being plotted, perim specifies a function having two arguments. The first is the vector of values of the first variable that is about to be plotted on the x-axis. The second argument is the single value of the variable representing different curves, for the current curve being plotted. The function's returned value must be a logical vector whose length is the same as that of the first argument, with values TRUE if the corresponding point should be plotted for the current curve, FALSE otherwise. See one of the latter examples.
method For 3-d plots, use method="persp" for perspective plots (persp(), the default), method="contour" to use contour(), or method="image" to use image(). Specify method="dotchart" to make a horizontal dot chart to represent predicted values associated with categorical predictors. The log argument does not apply to these plot types. You can specify method="default" to get the default behaviour. For "dotchart", the dotchart2 function in the Hmisc library is used.
sortdot applies when method="dotchart". The default is to plot the points in the order requested for predictions. Specify method="ascending" or an abbreviation such as method="a" to sort in ascending order before plotting the dot chart. You may also specify method="descending". Unless method="neither", specifying add=TRUE may not work properly.
nlevels number of contour levels if method="contour"
name Instead of specifying the variable to plot on the x-axis in the variables list, you can specify one or more variables to plot by specifying a vector of character string variable names in the name argument. Using this mode you cannot specify a list of variable values to use; plotting is done as if you had said e.g. age=NA. Also, interacting factors can only be set to their reference values using this notation.
zlim If 'type="persp"' controls the range for plottin in the z-axis. Computed by default.
vnames applies when no x-variable is specified (i.e., when all predictors are being plotted). To override the default x-axis label in that case (variable "label" attributes) to instead use variable names, specify vnames="names".
object an object created by plot.Design
abbrev Set to TRUE to use the abbreviate function to abbreviate levels of categorical factors for labeling tick marks on the x-axis.
size
horizontal
nint see image.legend
at If fun is specified to Legend, at may be given. at is a vector of values at which to evaluate fun for drawing tick marks in the color legend. For example, if you want to show the median survival time for a log-normal survival model whereas the linear predictor (log median) was used in constructing the image plot, and if you want to place tick marks at nice median values, specify fun=exp, at=log(c(1,10,100,1000)).
zlab label for image color axis legend. Default is from the model (e.g., "Log Odds"), but zlab will often be specified if fun was specified to plot.Design or Legend.
x1 data vector for first variable in plot (x-axis variable)
x2 data vector for second variable in plot if it was not constant (curve-generating variable)

Details

When there are no intercepts in the fitted model, plot subtracts adjustment values from each factor while computing variances for confidence limits.

perimeter is a kind of generalization of datadist for 2 continuous variables. First, the n smallest and largest x values are determined. These form the lowest and highest possible xs to display. Then x is grouped into intervals bounded by these two numbers, with the interval widths defined by xinc. Within each interval, y is sorted and the nth smallest and largest y are taken as the interval containing sufficient data density to plot interaction surfaces. The interval is ignored when there are insufficient y values. When plot.Design readies the data for persp, it uses the approx function to do linear interpolation of the y-boundaries as a function of the x values actually used in forming the grid (the values of the first variable specified to plot). To make the perimeter smooth, specify lowess.=TRUE to perimeter.

Specifying time will not work for Cox models with time-dependent covariables. Use survest or survfit for that purpose.

Use ps.slide, win.slide, gs.slide to set up nice defaults for plotting. These also set a system option mgp.axis.labels to allow x and y-axes to have differing mgp graphical parameters (see par). This is important when labels for y-axis tick marks are to be written horizontally (par(las=1)), as a larger gap between the labels and the tick marks are needed. You can set the axis-specific 2nd components of mgp using mgp.axis.labels(c(xvalue,yvalue)).

Note that because the generic plot method has the variable x as its first argument, you cannot explicitly specify that you want to plot the effect of a predictor named x.

Value

perimeter returns a matrix of class perimeter. This outline can be conveniently plotted by lines.perimeter. Legend.plot.Design invisibly returns the position of the legend. plot.Design invisibly returns an invisible object of class "plot.Design" with the following components (use print.plot.Design to get a nice printout of the object):

x.xbeta data frame of values plotted. First column is sequence of x-values. If a second variable was plotted, second column is sequence of y-values. Next column is estimated X beta, followed by a column of lower confidence limits and upper confidence limits. If fun was specified, these last three columns are transformed by the specified function.
adjust character string of the form "sex=male age=50" containing settings of non-plotted factors.
curve.labels character vector containing values of y-variable if it determined different curves on the plot. E.g., c("female","male") or c("10","20","30"). This vector is useful as an argument to the S key function if label.curves=FALSE.
plot.type "curves" or "3d"
method from plot.Design call
lty vector of line types used for curves (if the plot used a few curves to represent a second variable plotted)
lwd vector of line widths used
col vector of color codes

Author(s)

Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu

See Also

datadist, predict.Design, contrast.Design, summary.Design, persp, Design, Design.trans, survest, survplot, Design.Misc, contour, image, labcurve, scat1d, dotchart2, mgp.axis.labels Overview, par, ps.slide, xYplot, smearingEst

Examples

n <- 1000    # define sample size
set.seed(17) # so can reproduce the results
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))
label(age)            <- 'Age'      # label is in Hmisc
label(cholesterol)    <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex)            <- 'Sex'
units(cholesterol)    <- 'mg/dl'   # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'

# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
  (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')

fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
               x=TRUE, y=TRUE)

par(mfrow=c(2,2))
plot(fit)                # Plot effects of all 4 predictors
par(mfrow=c(1,2))
plot(fit, name=c('age','cholesterol'))   # Make 2 plots
par(mfrow=c(1,1))
plot(fit, age=seq(20,80,length=100), sex=NA, conf.int=FALSE)
                         # Plot relationship between age and log
                         # odds, separate curve for each sex,
                         # no C.I.
z <- plot(fit, age=NA, sex=NA, label.curves=FALSE)
                         # use label.curves=list(keys=c('a','b'))'
                         # to use 1-letter abbreviations
datadensity(z, age, sex) # rug plots (1-dimensional scatterplots)
                         # on each treatment curve, with treatment-
                         # specific density of age
plot(fit, age=seq(20,80,length=100), sex='male')  # works if datadist not used
plot(fit, age=NA, cholesterol=NA)# 3-dimensional perspective plot for age,
                         # cholesterol, and log odds using default
                         # ranges for both variables
boundaries <- perimeter(age, cholesterol, lowess=TRUE)
plot(age, cholesterol)   # show bivariate data density
lines(boundaries)        # and perimeter that will be used for 3-D plot
z <- plot(fit, age=NA, cholesterol=NA, perim=boundaries, method='image')
                         # draws image() plot
                         # don't show estimates where data are sparse
                         # doesn't make sense here since vars don't interact
if(!.R.)Legend(z, fun=plogis, at=qlogis(c(.01,.05,.1,.2,.3,.4,.5)),
               zlab='Probability')   # gray scale or color legend for prob.
plot(fit, age=NA, fun=function(x) 1/(1+exp(-x)) , # or fun=plogis
     ylab="Prob", conf.int=.9)    # Plot estimated probabilities instead of
                                  # log odds

# Plot the age effect as an odds ratio
# comparing the age shown on the x-axis to age=30 years

ddist$limits$age[2] <- 30    # make 30 the reference value for age
# Could also do: ddist$limits["Adjust to","age"] <- 30
fit <- update(fit)   # make new reference value take effect
plot(fit, age=NA, ref.zero=TRUE, fun=exp, ylab='Age=x:Age=30 Odds Ratio')
abline(h=1, lty=2, col=2); abline(v=30, lty=2, col=2)

# Make two curves, and plot the predicted curves as two trellis panels
w <- plot(fit, age=NA, sex=NA)   # Would be nice if a pl=FALSE option was avail.
z <- data.frame(w$x.xbeta)     # Makes variable names legal
if(.R.) library(lattice)
xyplot(log.odds ~ age | sex, data=z, type='l')
# To add confidence bands we need to use the Hmisc xYplot function in
# place of xyplot
xYplot(Cbind(log.odds,lower,upper) ~ age | sex, data=z, 
       method='bands', type='l')
# If non-displayed variables were in the model, add a subtitle to show
# their settings using title(sub=paste('Adjusted to',w$adjust),adj=0)
# See predict.Design for an example using predict and xYplot without plot()



# Plots for a parametric survival model
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, 
              rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
ddist <- datadist(age, sex)
Srv <- Surv(t,e)

# Fit log-normal survival model and plot median survival time vs. age
f <- psm(Surv(t, e) ~ rcs(age), dist=if(.R.)'lognormal' else 'gaussian')
med <- Quantile(f)       # Creates function to compute quantiles
                         # (median by default)
plot(f, age=NA, fun=function(x)med(lp=x), ylab="Median Survival Time")
# Note: This works because med() expects the linear predictor (X*beta)
#       as an argument.  Would not work if use 
#       plot(..., ref.zero=TRUE or adj.zero=TRUE)
# Also, confidence intervals from this method are approximate since
# they don't take into account estimation of scale parameter

# Fit an ols model to log(y) and plot the relationship between x1
# and the predicted mean(y) on the original scale without assuming
# normality of residuals; use the smearing estimator
set.seed(1)
x1 <- runif(300)
x2 <- runif(300)
ddist <- datadist(x1,x2)
y  <- exp(x1+x2-1+rnorm(300))
f <- ols(log(y) ~ pol(x1,2)+x2)
r <- resid(f)
smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean')
formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)])
#smean$res <- r[!is.na(r)]   # define default res argument to function
plot(f, x1=NA, fun=smean, ylab='Predicted Mean on y-scale')

options(datadist=NULL)

## Not run: 
# Example in which separate curves are shown for 4 income values
# For each curve the estimated percentage of voters voting for
# the democratic party is plotted against the percent of voters
# who graduated from college.  scat1d is used to indicate
# the income-interval-specific data density for college.  For
# this purpose show the distribution of percent in college for
# those having an income level within +/- the half-width of
# the income interval.  scat1d shows the rug plot superimposed
# on the estimated curve.  Data are county-level percents.
# This can't be done automatically using datadensity on the object
# returned by plot.Design, as the variable representing different
# curves (income) is a continuous variable.

incomes <- seq(22900, 32800, length=4)  
# equally spaced to outer quintiles
pl <- plot(f, college=NA, income=incomes, 
           conf.int=FALSE, xlim=c(0,35), ylim=c(30,55),
           lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2))
graph.points <- pl$x.xbeta
for(i in 1:4) {
  college.in.income.group <- college[abs(income-incomes[i]) < 1650]
  this.income <- graph.points[,'income']==incomes[i]
  scat1d(college.in.income.group,
         curve=list(x=graph.points[this.income,'college'],
           y=graph.points[this.income,'democrat']))
}

# Instead of showing a rug plot on each curve, erase end portions
# of each curve where there are fewer than 10 counties having
# % college graduates to the left of the x-coordinate being plotted,
# for the subset of counties having median family income with 1650
# of the target income for the curve

show.pts <- function(college.pts, income.pt) {
  s <- abs(income - income.pt) < 1650  #assumes income known to top frame
  x <- college[s]
  x <- sort(x[!is.na(x)])
  n <- length(x)
  low <- x[10]; high <- x[n-9]
  college.pts >= low & college.pts <= high
}

plot(f, college=NA, income=incomes,
     conf.int=FALSE, xlim=c(0,35), ylim=c(30,55),
     lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2),
     perim=show.pts)
## End(Not run)

[Package Design version 2.0-12 Index]