uk.ac.gla.dcs.renaissance.mg4j
Class Utils

java.lang.Object
  extended by uk.ac.gla.dcs.renaissance.mg4j.Utils

public class Utils
extends Object

Some utilities

Author:
Ingo Frommholz

Constructor Summary
Utils()
           
 
Method Summary
static Set<Integer> getDocumentTerms(it.unimi.dsi.mg4j.document.DocumentCollection collection, int docID, it.unimi.dsi.mg4j.index.TermProcessor processor, IndexConfiguration index)
          Get the set of terms of a document d
static int getSumDF(IndexConfiguration indexConf)
          This methods iterates over all terms of the given index and computes the sum of all terms' document frequencies (number of documents a term appears in)
static int getWithinDocumentFrequencies(it.unimi.dsi.mg4j.document.DocumentCollection collection, int docID, it.unimi.dsi.mg4j.index.TermProcessor processor, IndexConfiguration index, it.unimi.dsi.fastutil.ints.AbstractInt2IntMap relFreq)
          Get the within-document frequencies tf(t,d) of all terms t in a document d and the total number of tokens counted in d.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Utils

public Utils()
Method Detail

getWithinDocumentFrequencies

public static int getWithinDocumentFrequencies(it.unimi.dsi.mg4j.document.DocumentCollection collection,
                                               int docID,
                                               it.unimi.dsi.mg4j.index.TermProcessor processor,
                                               IndexConfiguration index,
                                               it.unimi.dsi.fastutil.ints.AbstractInt2IntMap relFreq)
                                        throws IOException,
                                               bpiwowar.lang.RuntimeException
Get the within-document frequencies tf(t,d) of all terms t in a document d and the total number of tokens counted in d.

Parameters:
collection - the document collection
docID - the document ID
processor - the term processor used
index - the index configuration
relFreq - the Int2IntMap where the frequencies are returned, with the term ID as key and its frequency within the document as value. If the term does not appear in the document it won't appear here as well. Note that relFreq will be cleared before starting the calculation.
Returns:
the number of tokens found in the document
Throws:
IOException
bpiwowar.lang.RuntimeException

getSumDF

public static int getSumDF(IndexConfiguration indexConf)
                    throws IOException,
                           UnsupportedOperationException
This methods iterates over all terms of the given index and computes the sum of all terms' document frequencies (number of documents a term appears in)

Parameters:
index - the index under consideration
Returns:
the sum of all document frequencies
Throws:
IOException - if something went wrong while accessing the index
UnsupportedOperationException

getDocumentTerms

public static Set<Integer> getDocumentTerms(it.unimi.dsi.mg4j.document.DocumentCollection collection,
                                            int docID,
                                            it.unimi.dsi.mg4j.index.TermProcessor processor,
                                            IndexConfiguration index)
                                     throws IOException,
                                            bpiwowar.lang.RuntimeException
Get the set of terms of a document d

Parameters:
collection - the document collection
docID - the document ID
processor - the term processor used
index - the index configuration
Returns:
the terms belonging to the document
Throws:
IOException
bpiwowar.lang.RuntimeException


Copyright © 2011. All Rights Reserved.