public static class Fetcher.InputFormat extends SequenceFileInputFormat<Text,CrawlDatum>
FileInputFormat.Counter
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
Constructor and Description |
---|
InputFormat() |
Modifier and Type | Method and Description |
---|---|
List<InputSplit> |
getSplits(JobContext job)
Don't split inputs to keep things polite - a single fetch list must be
processed in one fetcher task.
|
createRecordReader, getFormatMinSplitSize, listStatus
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public List<InputSplit> getSplits(JobContext job) throws IOException
getSplits
in class FileInputFormat<Text,CrawlDatum>
IOException
Copyright © 2021 The Apache Software Foundation