|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
Corpus | A Corpus is a set of TextPassage s
that are processed to build a SemanticSpace . |
CorpusParameters | Class that encapsulates all the parameters required to create
a Corpus and its corresponding SemanticSpace . |
Dictionary | This class represents a group of Term s or words/symbols, usually
obtained from a set of documents or text passages. |
ParagraphCorpus | Corpus that represents the paragraphs of a TextDocument |
RepositoryCorpus | This class represents a corpus with all the documents in the repository |
SearchResultsCorpus | This class represents a general corpus where any search criteria can be used |
SentenceCorpus | Class representing a corpus formed with the sentences of a document |
SimpleCorpus | SimpleCorpus is a simple corpus which contains a set of documents from a folder, it consider each document a vector. |
Term |
The Term class represents a unique word within a Corpus . |
TextDocument | The TextDocument class represents a whole document, which comprises a content, a title and a url. |
TextPassage | This class represents a text passage, that is part of a Corpus . |
Enum Summary | |
---|---|
CorpusParameters.DimensionalityReduction | Criteria by which a SemanticSpace will reduce (or not) the
dimensions of the space. |
CorpusParameters.TermSelection | The criteria to select the terms that will be kept in the corpus |
Implements all the classes required for corpora management as Bags of Words, it also includes NLP for sentences.
This package implements the bag of words approach for documents at three levels: Document, paragraph and sentences. As grammatical information is available at the sentence level, it also includes the PennTree bank tree parse of each sentence.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |