org.kit.furia
Interface IRIndex<O extends org.ajmm.obsearch.OB>

All Known Subinterfaces:
IRIndexShort<O>
All Known Implementing Classes:
AbstractIRIndex, FIRIndexShort

public interface IRIndex<O extends org.ajmm.obsearch.OB>

IRIndex holds the basic functionality for an Information Retrieval system that works on OB objects (please see obsearch.berlios.de). By using a distance function d, we transform the queries in terms of the closest elements that are in the database, and once this transformation is performed, we utilize an information retrieval system to perform the matching. Because our documents are multi-sets, the distribution of OB objects inside a document is taken into account. So, instead of matching a huge syntax tree of for example, music, we cut a song into pieces, match the pieces and then the overall finger-print of the multi-set of OB objects is matched.

Since:
0
Author:
Arnoldo Jose Muller Molina

Method Summary
 void close()
          Closes the databases.
 int delete(java.lang.String documentName)
          Deletes the given string document from the database.
 void freeze()
          Freezes the index.
 org.ajmm.obsearch.Index<O> getIndex()
          Returns the underlying OBSearch index.
 float getMSetScoreThreshold()
          The M-set score threshold is the minimum naive score for multi-sets that the index will accept.
 float getSetScoreThreshold()
          * The Set score threshold is the minimum naive score for Sets that the index will accept.
 int getSize()
          Returns the number of documents stored in this index.
 int getWordsSize()
          Returns the count different words that are used by the documents indexed.
 void insert(Document<O> document)
          Inserts a new document into the database.
 boolean isValidationMode()
          Tells whether or not the index is in validation mode.
 void setMSetScoreThreshold(float setScoreThreshold)
          The M-set score threshold is the minimum naive score for multi-sets that the index will accept.
 void setSetScoreThreshold(float setScoreThreshold)
          The Set score threshold is the minimum naive score for Sets that the index will accept.
 void setValidationMode(boolean validationMode)
          Sets whether or not the index is in validation mode.
 boolean shouldSkipDoc(Document<O> x)
          Returns true if the document corresponding to x's name exists in the DB.
 

Method Detail

insert

void insert(Document<O> document)
            throws IRException
Inserts a new document into the database.

Parameters:
document - The document to be inserted.
Throws:
IRException - If something goes wrong with the IR engine or with OBSearch.

delete

int delete(java.lang.String documentName)
           throws IRException
Deletes the given string document from the database. If more than one documents have the same name, all the documents will be erased.

Returns:
The number of documents deleted.
Throws:
IRException - If something goes wrong with the IR engine or with OBSearch.

getIndex

org.ajmm.obsearch.Index<O> getIndex()
Returns the underlying OBSearch index.

Returns:
the underlying OBSearch index.

freeze

void freeze()
            throws IRException
Freezes the index. From this point data can be inserted, searched and deleted. The index might deteriorate at some point so every once in a while it is a good idea to rebuild the index. This method will also

Throws:
IRException - If something goes wrong with the IR engine or with OBSearch.

close

void close()
           throws IRException
Closes the databases. You *should* close the databases after using an IRIndex.

Throws:
IRException - If something goes wrong with the IR engine or with OBSearch.

getSize

int getSize()
Returns the number of documents stored in this index.

Returns:
the number of documents stored in this index.

shouldSkipDoc

boolean shouldSkipDoc(Document<O> x)
                      throws java.io.IOException
Returns true if the document corresponding to x's name exists in the DB. This method is intended to be used in validation mode only.

Parameters:
x -
Returns:
true if the DB does not contain a document with name x.getName()
Throws:
java.io.IOException

getMSetScoreThreshold

float getMSetScoreThreshold()
The M-set score threshold is the minimum naive score for multi-sets that the index will accept.

Returns:
Returns the current M-set score threshold.

setMSetScoreThreshold

void setMSetScoreThreshold(float setScoreThreshold)
The M-set score threshold is the minimum naive score for multi-sets that the index will accept.

Parameters:
setScoreThreshold - the new threshold

getSetScoreThreshold

float getSetScoreThreshold()
* The Set score threshold is the minimum naive score for Sets that the index will accept.

Returns:
Returns the current Set score threshold.

setSetScoreThreshold

void setSetScoreThreshold(float setScoreThreshold)
The Set score threshold is the minimum naive score for Sets that the index will accept.

Parameters:
setScoreThreshold - the new threshold

getWordsSize

int getWordsSize()
                 throws com.sleepycat.je.DatabaseException
Returns the count different words that are used by the documents indexed.

Returns:
the count different words that are used by the documents indexed.
Throws:
com.sleepycat.je.DatabaseException

isValidationMode

boolean isValidationMode()
Tells whether or not the index is in validation mode. In validation mode we assume that documents with the same name are equal. This helps us to add additional statistics on the performance of the scoring technique.

Returns:
true if this index is in validation mode.

setValidationMode

void setValidationMode(boolean validationMode)
Sets whether or not the index is in validation mode. In validation mode we assume that documents with the same name are equal. This helps us to add additional statistics on the performance of the scoring technique.

Parameters:
validationMode - The new validation mode.


Copyright © 2008 Arnoldo Jose Muller Molina. All Rights Reserved.