public class IndexerMapReduce extends Configured
This class is typically invoked from within
IndexingJob
and handles all MapReduce
functionality required when undertaking indexing.
This is a consequence of one or more indexing plugins being invoked which
extend IndexWriter
.
See
org.apache.nutch.indexer.IndexerMapReduce#initMRJob(Path, Path, Collection, JobConf, boolean)
for details on the specific data structures and parameters required for
indexing.
Modifier and Type | Class and Description |
---|---|
static class |
IndexerMapReduce.IndexerMapper |
static class |
IndexerMapReduce.IndexerReducer |
Modifier and Type | Field and Description |
---|---|
static String |
INDEXER_BINARY_AS_BASE64 |
static String |
INDEXER_DELETE |
static String |
INDEXER_DELETE_ROBOTS_NOINDEX |
static String |
INDEXER_DELETE_SKIPPED |
static String |
INDEXER_NO_COMMIT |
static String |
INDEXER_PARAMS |
static String |
INDEXER_SKIP_NOTMODIFIED |
static String |
URL_FILTERING |
static String |
URL_NORMALIZING |
Constructor and Description |
---|
IndexerMapReduce() |
Modifier and Type | Method and Description |
---|---|
static void |
initMRJob(Path crawlDb,
Path linkDb,
Collection<Path> segments,
Job job,
boolean addBinaryContent) |
getConf, setConf
public static final String INDEXER_PARAMS
public static final String INDEXER_DELETE
public static final String INDEXER_NO_COMMIT
public static final String INDEXER_DELETE_ROBOTS_NOINDEX
public static final String INDEXER_DELETE_SKIPPED
public static final String INDEXER_SKIP_NOTMODIFIED
public static final String URL_FILTERING
public static final String URL_NORMALIZING
public static final String INDEXER_BINARY_AS_BASE64
public static void initMRJob(Path crawlDb, Path linkDb, Collection<Path> segments, Job job, boolean addBinaryContent) throws IOException
IOException
Copyright © 2021 The Apache Software Foundation