AbstractIRIndex (furia-chan 1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.kit.furia.index
Class AbstractIRIndex<O extends org.ajmm.obsearch.OB>

java.lang.Object
  org.kit.furia.index.AbstractIRIndex<O>


Type Parameters:: O - The basic unit in which all the information is divided. In the case of natural language documents, this would be a word.

All Implemented Interfaces:: IRIndex<O>

Direct Known Subclasses:: FIRIndexShort

public abstract class AbstractIRIndex<O extends org.ajmm.obsearch.OB>
extends java.lang.Object
implements IRIndex<O>
extends java.lang.Object
implements IRIndex<O>

AbstractIRIndex holds the basic functionality for an Information Retrieval system that works on OB objects (please see www.obsearch.net). By using a distance function d, we transform the queries in terms of the closest elements that are in the database, and once this transformation is performed, we utilize an information retrieval system (Apache's Lucene) to perform the matching.

Since:: 0
Author:: Arnoldo Jose Muller Molina

Nested Class Summary
`protected static class`	`AbstractIRIndex.FieldName` Lucene has the concepts of fields of a document.
`protected class`	`AbstractIRIndex.Word` Represents an OB object.

Field Summary
`protected org.apache.lucene.index.IndexReader`	`indexReader` This object is used to read different data from the index.
`protected org.apache.lucene.index.IndexWriter`	`indexWriter` This object is used to add elements to the index.
`protected float`	`mSetScoreThreshold` At least the given naive mset score must be obtained to consider a term in the result.
`protected org.apache.lucene.search.Searcher`	`searcher` This object is used to search the index;
`protected float`	`setScoreThreshold` At least the given naive set score must be obtained to consider a term in the result.
`protected boolean`	`validationMode` Tells whether or not the index is in validation mode.

Constructor Summary
`AbstractIRIndex(java.io.File dbFolder)` Creates a new IR index if none is available in the given path.

Method Summary
`protected ResultCandidate`	`calculateSimilarity(org.apache.lucene.document.Document document, java.util.Map<java.lang.Integer,java.lang.Integer> normalizedQuery, float score)` Calculates the ResultCandidate between a normalized query and a Lucene document.
`void`	`close()` Closes the databases.
`protected java.util.PriorityQueue<AbstractIRIndex.Word>`	`createPriorityQueue(java.util.Map<java.lang.Integer,java.lang.Integer> words)` Create a PriorityQueue from a word->tf map.
`int`	`delete(java.lang.String documentName)` Deletes the given string document from the database.
`void`	`freeze()` Freezes the index.
`float`	`getMSetScoreThreshold()` The M-set score threshold is the minimum naive score for multi-sets that the index will accept.
`float`	`getSetScoreThreshold()` * The Set score threshold is the minimum naive score for Sets that the index will accept.
`int`	`getSize()` Returns the # of documents in this DB.
`void`	`insert(Document<O> document)` Inserts a new document into the database.
`boolean`	`isValidationMode()` Tells whether or not the index is in validation mode.
`protected java.util.List<ResultCandidate>`	`processQueryResults(java.util.Map<java.lang.Integer,java.lang.Integer> normalizedQuery, short n, Document query)`
`void`	`setMSetScoreThreshold(float setScoreThreshold)` The M-set score threshold is the minimum naive score for multi-sets that the index will accept.
`void`	`setSetScoreThreshold(float setScoreThreshold)` The Set score threshold is the minimum naive score for Sets that the index will accept.
`void`	`setValidationMode(boolean validationMode)` Sets whether or not the index is in validation mode.
`boolean`	`shouldSkipDoc(Document<O> x)` Returns true if the document corresponding to x's name exists in the DB.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.kit.furia.IRIndex
`getIndex, getWordsSize`

Field Detail

indexWriter

protected org.apache.lucene.index.IndexWriter indexWriter

This object is used to add elements to the index.

indexReader

protected org.apache.lucene.index.IndexReader indexReader

This object is used to read different data from the index.

searcher

protected org.apache.lucene.search.Searcher searcher

This object is used to search the index;

mSetScoreThreshold

protected float mSetScoreThreshold

At least the given naive mset score must be obtained to consider a term in the result.

setScoreThreshold

protected float setScoreThreshold

At least the given naive set score must be obtained to consider a term in the result.

validationMode

protected boolean validationMode

Tells whether or not the index is in validation mode.

Constructor Detail

AbstractIRIndex

public AbstractIRIndex(java.io.File dbFolder)
                throws java.io.IOException

Creates a new IR index if none is available in the given path.

Parameters:: dbFolder - The folder in which Lucene's files will be stored
Throws:: java.io.IOException - If the given directory does not exist or if some other IO error occurs

Method Detail

delete

public int delete(java.lang.String documentName)
           throws IRException

Description copied from interface: IRIndex

Deletes the given string document from the database. If more than one documents have the same name, all the documents will be erased.

Specified by:: delete in interface IRIndex<O extends org.ajmm.obsearch.OB>

Returns:: The number of documents deleted.
Throws:: IRException - If something goes wrong with the IR engine or with OBSearch.

shouldSkipDoc

public boolean shouldSkipDoc(Document<O> x)
                      throws java.io.IOException

Returns true if the document corresponding to x's name exists in the DB. This method is intended to be used in validation mode only.

Specified by:: shouldSkipDoc in interface IRIndex<O extends org.ajmm.obsearch.OB>

Parameters:: x -
Returns:: true if the DB does not contain a document with name x.getName()
Throws:: java.io.IOException

calculateSimilarity

protected ResultCandidate calculateSimilarity(org.apache.lucene.document.Document document,
                                              java.util.Map<java.lang.Integer,java.lang.Integer> normalizedQuery,
                                              float score)

Calculates the ResultCandidate between a normalized query and a Lucene document.

Returns:: A result candidate for the given document and normalized query.

getSize

public int getSize()

Returns the # of documents in this DB.

Specified by:: getSize in interface IRIndex<O extends org.ajmm.obsearch.OB>

Returns:

processQueryResults

protected java.util.List<ResultCandidate> processQueryResults(java.util.Map<java.lang.Integer,java.lang.Integer> normalizedQuery,
                                                              short n,
                                                              Document query)
                                                       throws IRException

Throws:: IRException

insert

public void insert(Document<O> document)
            throws IRException

Description copied from interface: IRIndex

Inserts a new document into the database.

Specified by:: insert in interface IRIndex<O extends org.ajmm.obsearch.OB>

Parameters:: document - The document to be inserted.
Throws:: IRException - If something goes wrong with the IR engine or with OBSearch.

freeze

public void freeze()
            throws IRException

Description copied from interface: IRIndex

Freezes the index. From this point data can be inserted, searched and deleted. The index might deteriorate at some point so every once in a while it is a good idea to rebuild the index. This method will also

Specified by:: freeze in interface IRIndex<O extends org.ajmm.obsearch.OB>

Throws:: IRException - If something goes wrong with the IR engine or with OBSearch.

close

public void close()
           throws IRException

Description copied from interface: IRIndex

Closes the databases. You *should* close the databases after using an IRIndex.

Specified by:: close in interface IRIndex<O extends org.ajmm.obsearch.OB>

Throws:: IRException - If something goes wrong with the IR engine or with OBSearch.

createPriorityQueue

protected java.util.PriorityQueue<AbstractIRIndex.Word> createPriorityQueue(java.util.Map<java.lang.Integer,java.lang.Integer> words)
                                                                     throws java.io.IOException

Create a PriorityQueue from a word->tf map. (This code was borrowed from lucene-contrib)

Parameters:: words - a map of words keyed on the word(String) with Int objects as the values.
Returns:: A priority queue ordered by the most important word.
Throws:: java.io.IOException

getMSetScoreThreshold

public float getMSetScoreThreshold()

Description copied from interface: IRIndex

The M-set score threshold is the minimum naive score for multi-sets that the index will accept.

Specified by:: getMSetScoreThreshold in interface IRIndex<O extends org.ajmm.obsearch.OB>

Returns:: Returns the current M-set score threshold.

setMSetScoreThreshold

public void setMSetScoreThreshold(float setScoreThreshold)

Description copied from interface: IRIndex

The M-set score threshold is the minimum naive score for multi-sets that the index will accept.

Specified by:: setMSetScoreThreshold in interface IRIndex<O extends org.ajmm.obsearch.OB>

Parameters:: setScoreThreshold - the new threshold

getSetScoreThreshold

public float getSetScoreThreshold()

Description copied from interface: IRIndex

* The Set score threshold is the minimum naive score for Sets that the index will accept.

Specified by:: getSetScoreThreshold in interface IRIndex<O extends org.ajmm.obsearch.OB>

Returns:: Returns the current Set score threshold.

setSetScoreThreshold

public void setSetScoreThreshold(float setScoreThreshold)

Description copied from interface: IRIndex

The Set score threshold is the minimum naive score for Sets that the index will accept.

Specified by:: setSetScoreThreshold in interface IRIndex<O extends org.ajmm.obsearch.OB>

Parameters:: setScoreThreshold - the new threshold

isValidationMode

public boolean isValidationMode()

Description copied from interface: IRIndex

Tells whether or not the index is in validation mode. In validation mode we assume that documents with the same name are equal. This helps us to add additional statistics on the performance of the scoring technique.

Specified by:: isValidationMode in interface IRIndex<O extends org.ajmm.obsearch.OB>

Returns:: true if this index is in validation mode.

setValidationMode

public void setValidationMode(boolean validationMode)

Description copied from interface: IRIndex

Sets whether or not the index is in validation mode. In validation mode we assume that documents with the same name are equal. This helps us to add additional statistics on the performance of the scoring technique.

Specified by:: setValidationMode in interface IRIndex<O extends org.ajmm.obsearch.OB>

Parameters:: validationMode - The new validation mode.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.kit.furia.index Class AbstractIRIndex<O extends org.ajmm.obsearch.OB>

indexWriter

indexReader

searcher

mSetScoreThreshold

setScoreThreshold

validationMode

AbstractIRIndex

delete

shouldSkipDoc

calculateSimilarity

getSize

processQueryResults

insert

freeze

close

createPriorityQueue

getMSetScoreThreshold

setMSetScoreThreshold

getSetScoreThreshold

setSetScoreThreshold

isValidationMode

setValidationMode

org.kit.furia.index
Class AbstractIRIndex<O extends org.ajmm.obsearch.OB>