uk.ac.gla.dcs.renaissance.mg4j.trec
Class TRECDocumentFactory.TRECSegmentedDocument
java.lang.Object
it.unimi.dsi.mg4j.document.AbstractDocument
uk.ac.gla.dcs.renaissance.mg4j.trec.TRECDocumentFactory.TRECSegmentedDocument
- All Implemented Interfaces:
- it.unimi.dsi.io.SafelyCloseable, it.unimi.dsi.mg4j.document.Document, Closeable, MarkedUpDocument
- Enclosing class:
- TRECDocumentFactory
public class TRECDocumentFactory.TRECSegmentedDocument
- extends it.unimi.dsi.mg4j.document.AbstractDocument
- implements MarkedUpDocument
A TREC document. If a TITLE element is available, it will be
used for title()
instead of the default value.
The document may be segmented.
We delay the actual parsing until it is actually necessary, so operations
like getting the document URI will not require parsing.
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocument |
close, finalize |
TRECDocumentFactory.TRECSegmentedDocument
public TRECDocumentFactory.TRECSegmentedDocument(InputStream rawContent,
it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> metadata)
tags
public Iterator<TagPointer> tags(int field)
- Description copied from interface:
MarkedUpDocument
- Returns the tag pointers for the given field as an iterator. Should
return
null
if the field doesn't have any markup.
- Specified by:
tags
in interface MarkedUpDocument
- Parameters:
field
- the field
- Returns:
- iterator over tag pointers
title
public CharSequence title()
- Specified by:
title
in interface it.unimi.dsi.mg4j.document.Document
toString
public String toString()
- Overrides:
toString
in class it.unimi.dsi.mg4j.document.AbstractDocument
uri
public CharSequence uri()
- Specified by:
uri
in interface it.unimi.dsi.mg4j.document.Document
content
public Object content(int field)
throws IOException
- Specified by:
content
in interface it.unimi.dsi.mg4j.document.Document
- Throws:
IOException
getText
public it.unimi.dsi.lang.MutableString getText()
throws IOException
- Returns the text of this document
- Returns:
- the text
- Throws:
IOException
wordReader
public it.unimi.dsi.io.WordReader wordReader(int field)
- Specified by:
wordReader
in interface it.unimi.dsi.mg4j.document.Document
Copyright © 2011. All Rights Reserved.