Package | Description |
---|---|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.segment |
A segment stores all data from on generate/fetch/update cycle:
fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
|
Modifier and Type | Method and Description |
---|---|
static ParseText |
ParseText.read(DataInput in) |
Modifier and Type | Method and Description |
---|---|
void |
ParseResult.put(String key,
ParseText text,
ParseData data)
Store a result of parsing.
|
void |
ParseResult.put(Text key,
ParseText text,
ParseData data)
Store a result of parsing.
|
Constructor and Description |
---|
ParseImpl(ParseText text,
ParseData data) |
ParseImpl(ParseText text,
ParseData data,
boolean isCanonical) |
Modifier and Type | Method and Description |
---|---|
boolean |
SegmentMergeFilters.filter(Text key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
Iterates over all
SegmentMergeFilter extensions and if any of them
returns false, it will return false as well. |
boolean |
SegmentMergeFilter.filter(Text key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given
key (URL).
|
Copyright © 2021 The Apache Software Foundation