Class | Description |
---|---|
AbstractChecker |
Scaffolding class for the various Checker implementations.
|
CommandRunner | |
CrawlCompletionStats |
Extracts some simple crawl completion stats from the crawldb
Stats will be sorted by host/domain and will be of the form:
1 www.spitzer.caltech.edu FETCHED
50 www.spitzer.caltech.edu UNFETCHED
|
CrawlCompletionStats.CrawlCompletionStatsCombiner | |
DeflateUtils |
A collection of utility methods for working on deflated data.
|
DomUtil | |
DumpFileUtil | |
EncodingDetector |
A simple class for detecting character encodings.
|
FSUtils |
Utility methods for common filesystem operations.
|
GenericWritableConfigurable |
A generic Writable wrapper that can inject Configuration to
Configurable s |
GZIPUtils |
A collection of utility methods for working on GZIPed data.
|
HadoopFSUtil | |
JexlUtil |
Utility methods for handling JEXL expressions
|
LockUtil |
Utility methods for handling application-level locking.
|
MimeUtil |
This is a facade class to insulate Nutch from its underlying Mime Type
substrate library, Apache Tika.
|
NodeWalker |
A utility class that allows the walking of any DOM tree using a stack instead
of recursion.
|
NutchConfiguration |
Utility to create Hadoop
Configuration s that include Nutch-specific
resources. |
NutchJob |
A
Job for Nutch jobs. |
NutchTool | |
ObjectCache | |
PrefixStringMatcher |
A class for efficiently matching
String s against a set of
prefixes. |
ProtocolStatusStatistics |
Extracts protocol status code information from the crawl database.
|
ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner | |
SegmentReaderUtil | |
SitemapProcessor |
Performs Sitemap processing by fetching sitemap links, parsing the content and merging
the urls from Sitemap (with the metadata) with the existing crawldb.
|
StringUtil |
A collection of String processing utility methods.
|
SuffixStringMatcher |
A class for efficiently matching
String s against a set of
suffixes. |
TableUtil | |
TimingUtil | |
TrieStringMatcher |
TrieStringMatcher is a base class for simple tree-based string matching.
|
URLUtil |
Utility class for URL analysis
|
Copyright © 2021 The Apache Software Foundation