Package | Description |
---|---|
org.apache.nutch.collection |
Subcollection is a subset of an index.
|
org.apache.nutch.net |
Web-related interfaces: URL
filters
and normalizers . |
org.apache.nutch.urlfilter.api |
Generic
URL filter library,
abstracting away from regular expression implementations. |
org.apache.nutch.urlfilter.automaton |
URL filter plugin based on
dk.brics.automaton Finite-State
Automata for JavaTM.
|
org.apache.nutch.urlfilter.domain |
URL filter plugin to include only URLs which match an element in a given list of
domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.domaindenylist |
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.fast |
URL filter plugin that first does fast exact suffix matches on host/domain
names before applying regular expressions to the path component of a URL.
|
org.apache.nutch.urlfilter.ignoreexempt |
URL filter plugin which identifies exemptions to external urls when
when external urls are set to ignore.
|
org.apache.nutch.urlfilter.prefix |
URL filter plugin to include only URLs which match one of a given list of URL prefixes.
|
org.apache.nutch.urlfilter.regex |
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
|
org.apache.nutch.urlfilter.suffix |
URL filter plugin to either exclude or include only URLs which match
one of the given (path) suffixes.
|
org.apache.nutch.urlfilter.validator |
URL filter plugin that validates given urls.
|
Modifier and Type | Class and Description |
---|---|
class |
Subcollection
SubCollection represents a subset of index, you can define url patterns that
will indicate that particular page (url) is part of SubCollection.
|
Modifier and Type | Method and Description |
---|---|
URLFilter[] |
URLFilters.getFilters() |
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilterBase
Generic
URL filter based on regular
expressions. |
Modifier and Type | Class and Description |
---|---|
class |
AutomatonURLFilter
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State
Automata for JavaTM.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
DomainDenylistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and
hostnames.
|
Modifier and Type | Class and Description |
---|---|
class |
FastURLFilter
Filters URLs based on a file of regular expressions using host/domains
matching first.
|
Modifier and Type | Class and Description |
---|---|
class |
ExemptionUrlFilter
This implementation of
URLExemptionFilter uses regex configuration
to check if URL is eligible for exemption from 'db.ignore.external'. |
Modifier and Type | Class and Description |
---|---|
class |
PrefixURLFilter
Filters URLs based on a file of URL prefixes.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilter
Filters URLs based on a file of regular expressions using the
Java Regex implementation . |
Modifier and Type | Class and Description |
---|---|
class |
SuffixURLFilter
Filters URLs based on a file of URL suffixes.
|
Modifier and Type | Class and Description |
---|---|
class |
UrlValidator
Validates URLs.
|
Copyright © 2021 The Apache Software Foundation