Package | Description |
---|---|
org.apache.nutch.net.urlnormalizer.ajax | |
org.apache.nutch.net.urlnormalizer.basic |
URL normalizer performing basic normalizations:
remove default ports, e.g., port 80 for
http:// URLs
remove needless slashes and dot segments in the path component
remove anchors
use percent-encoding (only) where needed
E.g.,
https://www.example.org/a/../b//./select%2Dlang.php?lang=espaƱol#anchor |
org.apache.nutch.net.urlnormalizer.host |
URL normalizer renaming hosts to a canonical form listed in the
configuration file.
|
org.apache.nutch.net.urlnormalizer.pass |
URL normalizer dummy which does not change URLs.
|
org.apache.nutch.net.urlnormalizer.protocol | |
org.apache.nutch.net.urlnormalizer.querystring |
URL normalizer which sort the elements in the query part to avoid duplicates
by permutations.
|
org.apache.nutch.net.urlnormalizer.regex |
URL normalizer with configurable rules based on regular expressions
(
Pattern ). |
org.apache.nutch.net.urlnormalizer.slash |
Modifier and Type | Class and Description |
---|---|
class |
AjaxURLNormalizer
URLNormalizer capable of dealing with AJAX URL's.
|
Modifier and Type | Class and Description |
---|---|
class |
BasicURLNormalizer
Converts URLs to a normal form:
remove dot segments in path:
/./ or /../
remove default ports, e.g. |
Modifier and Type | Class and Description |
---|---|
class |
HostURLNormalizer
URL normalizer for mapping hosts to their desired form.
|
Modifier and Type | Class and Description |
---|---|
class |
PassURLNormalizer
This URLNormalizer doesn't change urls.
|
Modifier and Type | Class and Description |
---|---|
class |
ProtocolURLNormalizer |
Modifier and Type | Class and Description |
---|---|
class |
QuerystringURLNormalizer
URL normalizer plugin for normalizing query strings but sorting query string
parameters.
|
Modifier and Type | Class and Description |
---|---|
class |
RegexURLNormalizer
Allows users to do regex substitutions on all/any URLs that are encountered,
which is useful for stripping session IDs from URLs.
|
Modifier and Type | Class and Description |
---|---|
class |
SlashURLNormalizer |
Copyright © 2021 The Apache Software Foundation