tml.vectorspace
Class TermWeighting

java.lang.Object
  extended by tml.vectorspace.TermWeighting

public class TermWeighting
extends java.lang.Object

The TermWeighting filter transforms a basic Term/Document matrix to a different Term/Weighting scheme. At the moment we support a combination of Local Weights and Global Weights. Local Weights can be: TF: The raw term frequency. TFn: Raw term frequency normalized within the document. LOGTF: Calculates LogEntropy weight for all numeric values in the given dataset (apart from the class attribute, if set). The resulting values are the product of a local weight (1 + log(tf)) and a global weight 1 - Sum_i ((tf/gf)*log(tf/gf))/log(N) with tf: the raw term frequency gf: the global term frequency, number of times the term appears in the corpus (i.e. Sum_i tf) N: number of documents (or parts) in the corpus. More details in"Dumais, Susan 1990 Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval" .

Author:
Jorge Villalon

Nested Class Summary
static class TermWeighting.GlobalWeight
          Implemented global weight functions
static class TermWeighting.LocalWeight
          Implemented local weight functions
 
Constructor Summary
TermWeighting(Corpus corpus)
           
 
Method Summary
 Jama.Matrix process(Jama.Matrix termdoc)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TermWeighting

public TermWeighting(Corpus corpus)
Method Detail

process

public Jama.Matrix process(Jama.Matrix termdoc)
                    throws TermWeightingException
Throws:
TermWeightingException