Package | Description |
---|---|
org.apache.nutch.urlfilter.api |
Generic
URL filter library,
abstracting away from regular expression implementations. |
org.apache.nutch.urlfilter.automaton |
URL filter plugin based on
dk.brics.automaton Finite-State
Automata for JavaTM.
|
org.apache.nutch.urlfilter.ignoreexempt |
URL filter plugin which identifies exemptions to external urls when
when external urls are set to ignore.
|
org.apache.nutch.urlfilter.regex |
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
|
Modifier and Type | Method and Description |
---|---|
static void |
RegexURLFilterBase.main(RegexURLFilterBase filter,
String[] args)
Filter the standard input using a RegexURLFilterBase.
|
Modifier and Type | Class and Description |
---|---|
class |
AutomatonURLFilter
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State
Automata for JavaTM.
|
Modifier and Type | Class and Description |
---|---|
class |
ExemptionUrlFilter
This implementation of
URLExemptionFilter uses regex configuration
to check if URL is eligible for exemption from 'db.ignore.external'. |
Modifier and Type | Class and Description |
---|---|
class |
RegexURLFilter
Filters URLs based on a file of regular expressions using the
Java Regex implementation . |
Copyright © 2021 The Apache Software Foundation