uk.ac.gla.dcs.renaissance.mg4j.trec
Class TRECDocumentFactory
java.lang.Object
it.unimi.dsi.mg4j.document.AbstractDocumentFactory
it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
uk.ac.gla.dcs.renaissance.mg4j.trec.TRECDocumentFactory
- All Implemented Interfaces:
- it.unimi.dsi.lang.FlyweightPrototype<it.unimi.dsi.mg4j.document.DocumentFactory>, it.unimi.dsi.mg4j.document.DocumentFactory, Serializable
public class TRECDocumentFactory
- extends it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
A factory that provides fields for body and title of HTML documents. It uses
internally a BulletParser
. A default encoding can be provided using
the property
PropertyBasedDocumentFactory.MetadataKeys.ENCODING
.
- See Also:
- Serialized Form
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory |
it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory.MetadataKeys |
Nested classes/interfaces inherited from interface it.unimi.dsi.mg4j.document.DocumentFactory |
it.unimi.dsi.mg4j.document.DocumentFactory.FieldType |
Fields inherited from class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory |
defaultMetadata |
Methods inherited from class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory |
ensureJustOne, getInstance, getInstance, getInstance, getInstance, parseProperties, parseProperties, resolve, resolve, resolveNotNull, sameKey |
Methods inherited from class it.unimi.dsi.mg4j.document.AbstractDocumentFactory |
ensureFieldIndex, toString |
TRECDocumentFactory
public TRECDocumentFactory(it.unimi.dsi.util.Properties properties)
throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
TRECDocumentFactory
public TRECDocumentFactory(it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> defaultMetadata)
TRECDocumentFactory
public TRECDocumentFactory(String[] property)
throws org.apache.commons.configuration.ConfigurationException
- Throws:
org.apache.commons.configuration.ConfigurationException
TRECDocumentFactory
public TRECDocumentFactory()
parseProperty
protected boolean parseProperty(String key,
String[] values,
it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> metadata)
throws org.apache.commons.configuration.ConfigurationException
- Overrides:
parseProperty
in class it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory
- Throws:
org.apache.commons.configuration.ConfigurationException
copy
public TRECDocumentFactory copy()
- Returns a copy of this document factory. A new parser is allocated for
the copy.
numberOfFields
public int numberOfFields()
fieldName
public String fieldName(int field)
fieldIndex
public int fieldIndex(String fieldName)
fieldType
public it.unimi.dsi.mg4j.document.DocumentFactory.FieldType fieldType(int field)
getDocument
public TRECDocumentFactory.TRECSegmentedDocument getDocument(InputStream rawContent,
it.unimi.dsi.fastutil.objects.Reference2ObjectMap<Enum<?>,Object> metadata)
throws IOException
- Throws:
IOException
getWordReader
public it.unimi.dsi.io.WordReader getWordReader()
setCollectionType
public void setCollectionType(TRECDocumentFactory.CollectionType t)
- Sets the type of the underlying collection (e.g. standard TREC
collection, WARC collection)
- Parameters:
t
- the collection type
Copyright © 2011. All Rights Reserved.