cluproxplot {cba}R Documentation

Cluster Proximity Plot

Description

Visualizes cluster quality using shading of a rearranged proximity matrix (see Ling, 1973). Objects belonging to the same cluster are displayed in consecutive order. The placement of clusters and the within cluster order is done by various seriation algorithms which try to place large similarities close to the diagonal. Compact clusters are visible as dark squares (high similarity) on the diagonal of the plot.

Additionally a Silhouette plot (Rousseeuw, 1987) is added.

The visualization was also inspired by CLUSION (see Strehl and Ghosh, 2002).

Usage

cluproxplot(x, labels = NULL, method = NULL, args = NULL, 
            plot = TRUE, plotOptions = NULL, ...)   

Arguments

x an object of class dist (distance) or a matrix.
labels NULL or an integer vector of the same length as rows/columns in x indicating the membership for each element in x as consecutive integers starting with one. The labels are used to reorder the matrix.
method a vector of character strings indicating the used seriation algorithms. The first element indicates the inter-cluster and the second element the intra-cluster seriation method. See seriation
args "list"; contains arguments passed on to the seriation algorithms.
plot logical"; if FALSE, no plot is produced. The returned object can be plotted later using the function plot which takes as the second argument a a list of plotting options (see plotOptions below).
plotOptions "list"; options for plotting the matrix. The list can contain the following elements:
clusterLabels
"logical"; display cluster labels in the plot.
averages
"logical"; display in the lower triangle of the plot the average pair-wise dissimilarity between clusters instead of the individual dissimilarities.
lines
"logical"; draw lines to separate clusters.
silhouettes
"logical"; include a silhouette plot (see Rousseeuw, 1987).
threshold
"numeric"; only plot distances below the threshold.
main
title for the plot.
col
colors used for the image plot (default: 100 shades of gray using the hcl colorspace with hcl(h = 0, c = 0, l = seq(5, 95, len = 100))).
colorkey
place a color key under the plot.
linesCol
color used for the lines to separate clusters.
newpage
logical"; start plot on a new page (see package grid).
pop
logical"; should the viewports created be popped (see package grid)?
... further arguments; currently unused.

Details

Value

An invisible object of class "cluProxMatrix" of the following elements:

order NULL or integer vector giving the order used to plot x.
method vector of character strings indicating the seriation methods used for plotting x.
k NULL or integer scalar giving the number of clusters generated.
description a data.frame containing information (label, size, average intra-cluster dissimilarity and the average silhouette) for the clusters as displayed in the plot (from top/left to bottom/right).

Author(s)

Michael Hahsler (hahsler@ai.wu-wien.ac.at)

References

Ling, R.F. A computer generated aid for cluster analysis. Comm. of the ACM, 16(6), 355-361, 1973.

Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65, 1987.

Strehl, A. and Ghosh, J. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 208-230, 2003.

See Also

dist (in package stats); package grid, seriation.

Examples

data("Votes")

### create dummy coding (with removed party affiliation)
x <- as.dummy(Votes[-17])

### calculate distance matrix
d <- dists(x, method = "binary")

### plot dissimilarity matrix unseriated
res <- cluproxplot(d, method = "No seriation", 
        plotOptions = list(main = "No seriation"))

### plot matrix seriated
res <- cluproxplot(d, plotOptions = list(main = "Seriation - (Murtagh, 1985)"))

### cluster with pam
library("cluster")
l <- pam(d, 8, cluster.only = TRUE)
res <- cluproxplot(d, l, plotOptions = list(main = "PAM + Seriation (Murtagh)"))

### now we use a different seriation algorithm (hclust + optimal leaf ordering)
### and just do the seriation and then use plot to produce the plot
res <- cluproxplot(d, l, method = c("Optimal", "Optimal"), plot = FALSE)
res

### use blue (hue is 260 with decreasing chroma and  increasing luminance 
### towards a distance of 1)
plot(res, plotOptions = list(main = "PAM + Seriation (Optimal Leaf ordering)", 
        col = hcl(h = 260, c = seq(75,0, length=5), l = seq(30,95, length=5))))

### the result contains more information, e.g., the order used for reordering
### the matrix
names(res)
res$order

[Package cba version 0.2-1 Index]