TermDocMatrix {tm}R Documentation

Term-document matrix

Description

Constructs a term-document matrix.

Usage

## S4 method for signature 'TextDocCol':
TermDocMatrix(object, weighting = "tf", stemming
= FALSE, language = "english", minWordLength = 3, minDocFreq = 1,
stopwords = NULL)

Arguments

object a text document collection
weighting the weighting mode for the term-document matrix. Possible settings are
  • tf Term frequency
  • tf-idf Term frequency inverse document frequency
  • bin Binary frequency
  • logical Similar to binary frequency but with Boolean values
stemming if set, stems words before making the term-document matrix.
language the language determines the stemming rules
minWordLength words smaller than this number are discarded for the term-document matrix.
minDocFreq words that appear less often in documents than this number are discarded for the term-document matrix.
stopwords a plain text file with all stopwords

Value

An S4 object of class TermDocMatrix which extends the class matrix containing a term-document matrix. The following slots contain useful information:

Weighting The weighting mode applied to the term-document matrix

Author(s)

Ingo Feinerer


[Package tm version 0.1-1 Index]