public static class Injector.InjectReducer extends Reducer<Text,CrawlDatum,Text,CrawlDatum>
Reducer.Context
Constructor and Description |
---|
InjectReducer() |
Modifier and Type | Method and Description |
---|---|
void |
reduce(Text key,
Iterable<CrawlDatum> values,
Reducer.Context context)
Merge the input records of one URL as per rules below :
|
void |
setup(Reducer.Context context) |
public void setup(Reducer.Context context)
setup
in class Reducer<Text,CrawlDatum,Text,CrawlDatum>
public void reduce(Text key, Iterable<CrawlDatum> values, Reducer.Context context) throws IOException, InterruptedException
1. If there is ONLY new injected record ==> emit injected record 2. If there is ONLY old record ==> emit existing record 3. If BOTH new and old records are present: (a) If 'overwrite' is true ==> emit injected record (b) If 'overwrite' is false : (i) If 'update' is false ==> emit existing record (ii) If 'update' is true ==> update existing record and emit itFor more details @see NUTCH-1405
reduce
in class Reducer<Text,CrawlDatum,Text,CrawlDatum>
IOException
InterruptedException
Copyright © 2021 The Apache Software Foundation