uk.ac.gla.dcs.renaissance.mg4j.scorers
Class RelevanceLMScorer

java.lang.Object
  extended by it.unimi.dsi.fastutil.ints.AbstractIntIterator
      extended by it.unimi.dsi.mg4j.search.score.AbstractScorer
          extended by it.unimi.dsi.mg4j.search.score.AbstractIndexScorer
              extended by it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
                  extended by uk.ac.gla.dcs.renaissance.mg4j.scorers.RelevanceLMScorer
All Implemented Interfaces:
it.unimi.dsi.fastutil.ints.IntIterator, it.unimi.dsi.lang.FlyweightPrototype<it.unimi.dsi.mg4j.search.score.Scorer>, it.unimi.dsi.mg4j.search.score.DelegatingScorer, it.unimi.dsi.mg4j.search.score.Scorer, Iterator<Integer>

public class RelevanceLMScorer
extends it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
implements it.unimi.dsi.mg4j.search.score.DelegatingScorer

A scorer implementing the Lavrenko/Croft Language Modelling Approach. See Lavrenko, Victor, and W Bruce Croft. Relevance based language models. In Proceedings of the 24th Annual International Conference on Research and development in Information Retrieval, edited by W B Croft, D Harper, D H Kraft, and J Zobel, 120-127. New York: ACM, 2001.

This scorer implements the conditional sampling approach described in the paper above.

Author:
Ingo Frommholz

Field Summary
 
Fields inherited from class it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
currWeight, index2Weight
 
Fields inherited from class it.unimi.dsi.mg4j.search.score.AbstractIndexScorer
currIndex, n
 
Fields inherited from class it.unimi.dsi.mg4j.search.score.AbstractScorer
documentIterator
 
Constructor Summary
RelevanceLMScorer(double lambda, IndexConfiguration[] idxCfgs, it.unimi.dsi.mg4j.document.DocumentCollection collection, Collection<bpiwowar.utils.Pair<Integer,Float>> rfDocuments)
          Lambda is a smoothing constant used to calculate P(w|M_d).
RelevanceLMScorer(IndexConfiguration[] idxCfgs, it.unimi.dsi.mg4j.document.DocumentCollection collection, Collection<bpiwowar.utils.Pair<Integer,Float>> judgedDocuments)
          Use a default lambda (0.6)
 
Method Summary
 it.unimi.dsi.mg4j.search.score.Scorer copy()
           
 void init(it.unimi.dsi.mg4j.index.Index[] indexes, String[] queryTerms, int[] queryTermIndexNumbers)
          Initialises this scorer.
 void init(it.unimi.dsi.mg4j.index.Index[] indexes, String[] queryTerms, int[] queryTermIndexNumbers, Set<Integer> documentUniverse)
          Initialises this scorer.
 void init(it.unimi.dsi.mg4j.index.Index index, String[] queryTerms)
          If there is only one index involved in the game, we can safely use this simpler constructor
 void init(it.unimi.dsi.mg4j.index.Index index, String[] queryTerms, Set<Integer> documentUniverse)
          If there is only one index involved in the game, we can safely use this simpler constructor
 double score()
           
 double score(it.unimi.dsi.mg4j.index.Index index)
           
 double score(int documentID)
          Returns the score for the given document
 void setRFDocuments(Collection<bpiwowar.utils.Pair<Integer,Float>> judgedDocuments)
          To be able to process a new topic with this scorer instance, we need to set a new set of relevant documents coming from the base scorer.
 boolean usesIntervals()
           
 void wrap(it.unimi.dsi.mg4j.search.DocumentIterator d)
           
 
Methods inherited from class it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
getWeights, setWeights
 
Methods inherited from class it.unimi.dsi.mg4j.search.score.AbstractScorer
hasNext, nextDocument, nextInt, skip
 
Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator
next, remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface it.unimi.dsi.mg4j.search.score.Scorer
getWeights, nextDocument, nextInt, setWeights
 
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator
skip
 
Methods inherited from interface java.util.Iterator
hasNext, next, remove
 

Constructor Detail

RelevanceLMScorer

public RelevanceLMScorer(IndexConfiguration[] idxCfgs,
                         it.unimi.dsi.mg4j.document.DocumentCollection collection,
                         Collection<bpiwowar.utils.Pair<Integer,Float>> judgedDocuments)
Use a default lambda (0.6)

Parameters:
idxCfgs - The index configurations of each index. Used to get some required statistics. Please make sure that this array is in line with the term visitor we use in wrap().
collection - the document collection (needed to extract certain statistics)
judgedDocuments - the documents used for building the relevance model

RelevanceLMScorer

public RelevanceLMScorer(double lambda,
                         IndexConfiguration[] idxCfgs,
                         it.unimi.dsi.mg4j.document.DocumentCollection collection,
                         Collection<bpiwowar.utils.Pair<Integer,Float>> rfDocuments)
Lambda is a smoothing constant used to calculate P(w|M_d). It can be any value between (incl.) 0 and 1.

Parameters:
lambda - smoothing constant
idxCfgs - used to get some needed statistics
collection - the document collection (needed to extract certain statistics)
rfDocuments - the documents used for building the relevance model
Method Detail

setRFDocuments

public void setRFDocuments(Collection<bpiwowar.utils.Pair<Integer,Float>> judgedDocuments)
To be able to process a new topic with this scorer instance, we need to set a new set of relevant documents coming from the base scorer. Make sure the same index and collection is used! If this isn't possible, better create a new scorer instance.

Parameters:
judgedDocuments - the documents that were judged

score

public double score(it.unimi.dsi.mg4j.index.Index index)
Specified by:
score in interface it.unimi.dsi.mg4j.search.score.Scorer

score

public double score()
             throws IOException
Specified by:
score in interface it.unimi.dsi.mg4j.search.score.Scorer
Overrides:
score in class it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
Throws:
IOException

score

public double score(int documentID)
             throws IOException
Returns the score for the given document

Parameters:
documentID - the document ID
Returns:
the score
Throws:
IOException

usesIntervals

public boolean usesIntervals()
Specified by:
usesIntervals in interface it.unimi.dsi.mg4j.search.score.Scorer

copy

public it.unimi.dsi.mg4j.search.score.Scorer copy()
Specified by:
copy in interface it.unimi.dsi.lang.FlyweightPrototype<it.unimi.dsi.mg4j.search.score.Scorer>
Specified by:
copy in interface it.unimi.dsi.mg4j.search.score.Scorer

wrap

public void wrap(it.unimi.dsi.mg4j.search.DocumentIterator d)
          throws IOException
Specified by:
wrap in interface it.unimi.dsi.mg4j.search.score.Scorer
Overrides:
wrap in class it.unimi.dsi.mg4j.search.score.AbstractWeightedScorer
Throws:
IOException

init

public void init(it.unimi.dsi.mg4j.index.Index index,
                 String[] queryTerms,
                 Set<Integer> documentUniverse)
          throws IOException
If there is only one index involved in the game, we can safely use this simpler constructor

Parameters:
index - the index
queryTerms - the query terms
Throws:
IOException

init

public void init(it.unimi.dsi.mg4j.index.Index index,
                 String[] queryTerms)
          throws IOException
If there is only one index involved in the game, we can safely use this simpler constructor

Parameters:
index - the index
queryTerms - the query terms
Throws:
IOException

init

public void init(it.unimi.dsi.mg4j.index.Index[] indexes,
                 String[] queryTerms,
                 int[] queryTermIndexNumbers)
          throws IOException
Initialises this scorer. To be compatible with a MG4J scorer which may server several indexes, we regard term-index pairs. The whole collection is considered here.

Parameters:
indexes - An array of indexes
queryTerms - the query terms (in MG4J the term part of a term-index pair)
queryTermIndexNumbers - the query term index numbers (in MG4J the index part of a term-index pair). It must be valid index numbers for the indexes array.
Throws:
IOException

init

public void init(it.unimi.dsi.mg4j.index.Index[] indexes,
                 String[] queryTerms,
                 int[] queryTermIndexNumbers,
                 Set<Integer> documentUniverse)
          throws IOException
Initialises this scorer. To be compatible with a MG4J scorer which may server several indexes, we regard term-index pairs.

Parameters:
indexes - An array of indexes
queryTerms - the query terms (in MG4J the term part of a term-index pair)
queryTermIndexNumbers - the query term index numbers (in MG4J the index part of a term-index pair). It must be valid index numbers for the indexes array.
documentUniverse - the document universe to consider. If set to null, all documents and terms in the index are considered.
Throws:
IOException


Copyright © 2011. All Rights Reserved.