public class Any23IndexingFilter extends Object implements IndexingFilter
This implementation of IndexingFilter
adds a triple(s) field to the NutchDocument
.
Triples are extracted via Apache Any23.
org.apache.nutch.any23.Any23ParseFilter}.
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG
Logging instance
|
static String |
STRUCTURED_DATA |
X_POINT_ID
Constructor and Description |
---|
Any23IndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
Configuration |
getConf()
Get the
Configuration object |
void |
setConf(Configuration conf)
Set the
Configuration object |
public static final org.slf4j.Logger LOG
public static final String STRUCTURED_DATA
public Configuration getConf()
Configuration
objectgetConf
in interface Configurable
Configurable.getConf()
public void setConf(Configuration conf)
Configuration
objectsetConf
in interface Configurable
Configurable.setConf(org.apache.hadoop.conf.Configuration)
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
IndexingFilter
filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the page (fetch datum from segment containing
fetch status and fetch time)inlinks
- page inlinksIndexingException
IndexingFilter.filter(NutchDocument, Parse, Text, CrawlDatum, Inlinks)
Copyright © 2021 The Apache Software Foundation