http://
URLs
remove needless slashes and dot segments in the path component
remove anchors
use percent-encoding (only) where needed
E.g.,
https://www.example.org/a/../b//./select%2Dlang.php?lang=español#anchor
is normalized to https://www.example.org/b/select-lang.php?lang=espa%C3%B1ol
Optional and configurable normalizations are:
convert Internationalized Domain Names (IDNs) uniquely either to the
ASCII (Punycode) or Unicode representation, see property
urlnormalizer.basic.host.idn
remove a trailing dot from host names, see property
urlnormalizer.basic.host.trim-trailing-dot
See: Description
Class | Description |
---|---|
BasicURLNormalizer |
Converts URLs to a normal form:
remove dot segments in path:
/./ or /../
remove default ports, e.g. |
http://
URLshttps://www.example.org/a/../b//./select%2Dlang.php?lang=español#anchor
is normalized to https://www.example.org/b/select-lang.php?lang=espa%C3%B1ol
Optional and configurable normalizations are:
- convert Internationalized Domain Names (IDNs) uniquely either to the
ASCII (Punycode) or Unicode representation, see property
urlnormalizer.basic.host.idn
- remove a trailing dot from host names, see property
urlnormalizer.basic.host.trim-trailing-dot
Copyright © 2021 The Apache Software Foundation