uk.ac.gla.dcs.renaissance.mg4j.trec
Class TRECDocumentFactory.TRECSegmentedDocument

java.lang.Object
  extended by it.unimi.dsi.mg4j.document.AbstractDocument
      extended by uk.ac.gla.dcs.renaissance.mg4j.trec.TRECDocumentFactory.TRECSegmentedDocument
All Implemented Interfaces:
it.unimi.dsi.io.SafelyCloseable, it.unimi.dsi.mg4j.document.Document, Closeable, MarkedUpDocument
Enclosing class:
TRECDocumentFactory

public class TRECDocumentFactory.TRECSegmentedDocument
extends it.unimi.dsi.mg4j.document.AbstractDocument
implements MarkedUpDocument

A TREC document. If a TITLE element is available, it will be used for title() instead of the default value.

The document may be segmented.

We delay the actual parsing until it is actually necessary, so operations like getting the document URI will not require parsing.


Constructor Summary
TRECDocumentFactory.TRECSegmentedDocument(InputStream rawContent, it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> metadata)
           
 
Method Summary
 Object content(int field)
           
 it.unimi.dsi.lang.MutableString getText()
          Returns the text of this document
 Iterator<TagPointer> tags(int field)
          Returns the tag pointers for the given field as an iterator.
 CharSequence title()
           
 String toString()
           
 CharSequence uri()
           
 it.unimi.dsi.io.WordReader wordReader(int field)
           
 
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocument
close, finalize
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TRECDocumentFactory.TRECSegmentedDocument

public TRECDocumentFactory.TRECSegmentedDocument(InputStream rawContent,
                                                 it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> metadata)
Method Detail

tags

public Iterator<TagPointer> tags(int field)
Description copied from interface: MarkedUpDocument
Returns the tag pointers for the given field as an iterator. Should return null if the field doesn't have any markup.

Specified by:
tags in interface MarkedUpDocument
Parameters:
field - the field
Returns:
iterator over tag pointers

title

public CharSequence title()
Specified by:
title in interface it.unimi.dsi.mg4j.document.Document

toString

public String toString()
Overrides:
toString in class it.unimi.dsi.mg4j.document.AbstractDocument

uri

public CharSequence uri()
Specified by:
uri in interface it.unimi.dsi.mg4j.document.Document

content

public Object content(int field)
               throws IOException
Specified by:
content in interface it.unimi.dsi.mg4j.document.Document
Throws:
IOException

getText

public it.unimi.dsi.lang.MutableString getText()
                                        throws IOException
Returns the text of this document

Returns:
the text
Throws:
IOException

wordReader

public it.unimi.dsi.io.WordReader wordReader(int field)
Specified by:
wordReader in interface it.unimi.dsi.mg4j.document.Document


Copyright © 2011. All Rights Reserved.