read.DIF {utils} | R Documentation |
Reads a file in Data Interchange Format (DIF) and creates a data frame from it. DIF is a format for data matrices such as single spreadsheets.
read.DIF(file, header = FALSE, dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, blank.lines.skip = TRUE, stringsAsFactors = default.stringsAsFactors())
file |
the name of the file which the data are to be read from,
or a connection, or a complete URL.
The name "clipboard" may also be used on Windows, in which
case read.DIF("clipboard") will look for a DIF format entry
in the Windows clipboard.
|
header |
a logical value indicating whether the spreadsheet contains the
names of the variables as its first line. If missing, the value is
determined from the file format: header is set to TRUE
if and only if the first row contains only character values and
the top left cell is empty. |
dec |
the character used in the file for decimal points. |
row.names |
a vector of row names. This can be a vector giving
the actual row names, or a single number giving the column of the
table which contains the row names, or character string giving the
name of the table column containing the row names.
If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are
numbered.
Using row.names = NULL forces row numbering.
|
col.names |
a vector of optional names for the variables.
The default is to use "V" followed by the column number. |
as.is |
the default behavior of read.DIF is to convert
character variables (which are not converted to logical, numeric or
complex) to factors. The variable as.is controls the
conversion of columns not otherwise specified by colClasses .
Its value is either a vector of logicals (values are recycled if
necessary), or a vector of numeric or character indices which
specify which columns should not be converted to factors.
Note: to suppress all conversions including those of numeric columns, set colClasses = "character" .
Note that as.is is specified per column (not per
variable) and so includes the column of row names (if any) and any
columns to be skipped.
|
na.strings |
a character vector of strings which are to be
interpreted as NA values. Blank fields are also
considered to be missing values in logical, integer, numeric and
complex fields. |
colClasses |
character. A vector of classes to be assumed for
the columns. Recycled as necessary, or if the character vector is
named, unspecified values are taken to be NA .
Possible values are NA (when type.convert is
used), "NULL" (when the column is skipped), one of the atomic
vector classes (logical, integer, numeric, complex, character, raw),
or "factor" , "Date" or "POSIXct" . Otherwise
there needs to be an as method (from package methods)
for conversion from "character" to the specified formal
class.
Note that colClasses is specified per column (not per
variable) and so includes the column of row names (if any).
|
nrows |
the maximum number of rows to read in. Negative values are ignored. |
skip |
the number of lines of the data file to skip before beginning to read data. |
check.names |
logical. If TRUE then the names of the
variables in the data frame are checked to ensure that they are
syntactically valid variable names. If necessary they are adjusted
(by make.names ) so that they are, and also to ensure
that there are no duplicates. |
blank.lines.skip |
logical: if TRUE blank lines in the
input are ignored. |
stringsAsFactors |
logical: should character vectors be converted to factors? |
A data frame (data.frame
) containing a representation of
the data in the file. Empty input is an error unless col.names
is specified, when a 0-row data frame is returned: similarly giving
just a header line if header = TRUE
results in a 0-row data frame.
The columns referred to in as.is
and colClasses
include
the column of row names (if any).
Less memory will be used if colClasses
is specified as one of
the six atomic vector classes.
The DIF format specification can be found by searching on http://www.wotsit.org/; the optional header fields are ignored. See also http://en.wikipedia.org/wiki/Data_Interchange_Format.
The term is likely to lead to confusion: Windows will have a ‘Windows Data Interchange Format (DIF) data format’ as part of its WinFX system, which may or may not be compatible.
The R Data Import/Export manual.
scan
, type.convert
,
read.fwf
for reading fixed width
formatted input;
read.table
;
data.frame
.