Package | Description |
---|---|
org.apache.nutch.collection |
Subcollection is a subset of an index.
|
org.apache.nutch.hostdb | |
org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
org.apache.nutch.net |
Web-related interfaces: URL
filters
and normalizers . |
org.apache.nutch.net.urlnormalizer.ajax | |
org.apache.nutch.net.urlnormalizer.basic |
URL normalizer performing basic normalizations:
remove default ports, e.g., port 80 for
http:// URLs
remove needless slashes and dot segments in the path component
remove anchors
use percent-encoding (only) where needed
E.g.,
https://www.example.org/a/../b//./select%2Dlang.php?lang=espaƱol#anchor |
org.apache.nutch.net.urlnormalizer.host |
URL normalizer renaming hosts to a canonical form listed in the
configuration file.
|
org.apache.nutch.net.urlnormalizer.pass |
URL normalizer dummy which does not change URLs.
|
org.apache.nutch.net.urlnormalizer.protocol | |
org.apache.nutch.net.urlnormalizer.querystring |
URL normalizer which sort the elements in the query part to avoid duplicates
by permutations.
|
org.apache.nutch.net.urlnormalizer.regex |
URL normalizer with configurable rules based on regular expressions
(
Pattern ). |
org.apache.nutch.net.urlnormalizer.slash | |
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.urlfilter.api |
Generic
URL filter library,
abstracting away from regular expression implementations. |
org.apache.nutch.urlfilter.automaton |
URL filter plugin based on
dk.brics.automaton Finite-State
Automata for JavaTM.
|
org.apache.nutch.urlfilter.domain |
URL filter plugin to include only URLs which match an element in a given list of
domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.domaindenylist |
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.fast |
URL filter plugin that first does fast exact suffix matches on host/domain
names before applying regular expressions to the path component of a URL.
|
org.apache.nutch.urlfilter.ignoreexempt |
URL filter plugin which identifies exemptions to external urls when
when external urls are set to ignore.
|
org.apache.nutch.urlfilter.prefix |
URL filter plugin to include only URLs which match one of a given list of URL prefixes.
|
org.apache.nutch.urlfilter.regex |
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
|
org.apache.nutch.urlfilter.suffix |
URL filter plugin to either exclude or include only URLs which match
one of the given (path) suffixes.
|
org.apache.nutch.urlfilter.validator |
URL filter plugin that validates given urls.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilters
Creates and caches
URLFilter implementing plugins. |
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
URLFilterException |
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLExemptionFilters
Creates and caches
URLExemptionFilter implementing plugins. |
URLFilters
Creates and caches
URLFilter implementing plugins. |
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLExemptionFilter
Interface used to allow exemptions to external domain resources by overriding
db.ignore.external.links . |
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Copyright © 2021 The Apache Software Foundation