tml.vectorspace
Class TermWeighting
java.lang.Object
tml.vectorspace.TermWeighting
public class TermWeighting
- extends java.lang.Object
The TermWeighting filter transforms a basic
Term/Document matrix to a different Term/Weighting scheme. At the moment we
support a combination of Local Weights and Global Weights. Local Weights can
be: TF: The raw term frequency. TFn: Raw term frequency normalized within the
document. LOGTF: Calculates LogEntropy weight for all numeric values in the
given dataset (apart from the class attribute, if set). The resulting values
are the product of a local weight (1 + log(tf)) and a global weight 1 - Sum_i
((tf/gf)*log(tf/gf))/log(N) with tf: the raw term frequency gf: the global
term frequency, number of times the term appears in the corpus (i.e. Sum_i
tf) N: number of documents (or parts) in the corpus. More details in"Dumais, Susan 1990 Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval"
.
- Author:
- Jorge Villalon
Method Summary |
Jama.Matrix |
process(Jama.Matrix termdoc)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TermWeighting
public TermWeighting(Corpus corpus)
process
public Jama.Matrix process(Jama.Matrix termdoc)
throws TermWeightingException
- Throws:
TermWeightingException