public class Injector extends NutchTool implements Tool
Note, that some metadata keys are reserved:
Example:
http://www.nutch.org/ \t nutch.score=10 \t nutch.fetchInterval=2592000 \t userType=open_source
Modifier and Type | Class and Description |
---|---|
static class |
Injector.InjectMapper
InjectMapper reads
the CrawlDb seeds are injected into
the plain-text seed files and parses each line into the URL and
metadata.
|
static class |
Injector.InjectReducer
Combine multiple new entries for a url.
|
Modifier and Type | Field and Description |
---|---|
static String |
nutchFetchIntervalMDName
metadata key reserved for setting a custom fetchInterval for a specific URL
|
static String |
nutchFixedFetchIntervalMDName
metadata key reserved for setting a fixed custom fetchInterval for a
specific URL
|
static String |
nutchScoreMDName
metadata key reserved for setting a custom score for a specific URL
|
static String |
URL_FILTER_NORMALIZE_ALL
property to pass value of command-line option -filterNormalizeAll to mapper
|
currentJob, currentJobNum, numJobs, results, status
Constructor and Description |
---|
Injector() |
Injector(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
inject(Path crawlDb,
Path urlDir) |
void |
inject(Path crawlDb,
Path urlDir,
boolean overwrite,
boolean update) |
void |
inject(Path crawlDb,
Path urlDir,
boolean overwrite,
boolean update,
boolean normalize,
boolean filter,
boolean filterNormalizeAll) |
static void |
main(String[] args) |
Map<String,Object> |
run(Map<String,Object> args,
String crawlId)
Used by the Nutch REST service
|
int |
run(String[] args) |
void |
usage() |
getProgress, getStatus, killJob, stopJob
getConf, setConf
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getConf, setConf
public static final String URL_FILTER_NORMALIZE_ALL
public static String nutchScoreMDName
public static String nutchFetchIntervalMDName
public static String nutchFixedFetchIntervalMDName
public Injector()
public Injector(Configuration conf)
public void inject(Path crawlDb, Path urlDir) throws IOException, ClassNotFoundException, InterruptedException
public void inject(Path crawlDb, Path urlDir, boolean overwrite, boolean update) throws IOException, ClassNotFoundException, InterruptedException
public void inject(Path crawlDb, Path urlDir, boolean overwrite, boolean update, boolean normalize, boolean filter, boolean filterNormalizeAll) throws IOException, ClassNotFoundException, InterruptedException
public void usage()
Copyright © 2021 The Apache Software Foundation