tml.corpus
Class Dictionary

java.lang.Object
  extended by tml.corpus.Dictionary

public class Dictionary
extends java.lang.Object

This class represents a group of Terms or words/symbols, usually obtained from a set of documents or text passages. It is the common set of words for a group of documents. The dictionary can filter Terms based on a selection criteria and its specific threshold. By default the TermSelection criteria is a minimum DF or Document Frequency, i.e. the Term must appear in at least a certain number of different TextPassages indicated by the threshold. A Dictionary also maintains the list of Terms inside a TextPassage. When the TermSelection criteria is applied, the Dictionary removes the unused Terms from the TextPassages that contain those Terms.

Author:
Jorge Villalon

Constructor Summary
Dictionary(Corpus corpus)
          Basic constructor of a Dictionary, initialises the list and index of Terms
 
Method Summary
 void addTerms(java.lang.String[] newTerms, int[] termFreqs, TextPassage document)
          Adds an array of Terms to the Dictionary and their frequencies.
 Corpus getCorpus()
          Gets the Corpus to which the Dictionary belongs
 Term getTermByText(java.lang.String word)
          Returns a Term that represents a word, null if it is not in the Dictionary
 java.util.Collection<Term> getTerms()
          Returns the collection of Terms in the Dictionary
 void removeTerms()
          Remove the Terms from the Dictionary that doesn't meet the TermSelection criteria according to the threshold.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Dictionary

public Dictionary(Corpus corpus)
Basic constructor of a Dictionary, initialises the list and index of Terms

Parameters:
corpus -
Method Detail

addTerms

public void addTerms(java.lang.String[] newTerms,
                     int[] termFreqs,
                     TextPassage document)
Adds an array of Terms to the Dictionary and their frequencies. Both must come from a specific TextPassage

Parameters:
newTerms -
termFreqs -
document -

removeTerms

public void removeTerms()
Remove the Terms from the Dictionary that doesn't meet the TermSelection criteria according to the threshold.


getTerms

public java.util.Collection<Term> getTerms()
Returns the collection of Terms in the Dictionary

Returns:
a Collection of Terms

getCorpus

public Corpus getCorpus()
Gets the Corpus to which the Dictionary belongs

Returns:
a Corpus

getTermByText

public Term getTermByText(java.lang.String word)
Returns a Term that represents a word, null if it is not in the Dictionary

Parameters:
word - the word to look for
Returns:
the Term