Package | Description |
---|---|
org.apache.nutch.analysis.lang |
Text document language identifier.
|
org.apache.nutch.any23 |
This packages uses the Apache Any23 library
for parsing and extracting structured data in RDF format from a
variety of Web documents.
|
org.apache.nutch.indexer.anchor |
An indexing plugin for inbound anchor text.
|
org.apache.nutch.indexer.basic |
A basic indexing plugin, adds basic fields: url, host, title, content, etc.
|
org.apache.nutch.indexer.feed |
Indexing filter to index meta data from RSS feeds.
|
org.apache.nutch.indexer.filter | |
org.apache.nutch.indexer.geoip |
This plugin implements an indexing filter which takes
advantage of the
GeoIP2-java API.
|
org.apache.nutch.indexer.jexl |
This plugin implements a dynamic indexing filter which uses JEXL
expressions to allow filtering based on the page's metadata
|
org.apache.nutch.indexer.links | |
org.apache.nutch.indexer.metadata |
Indexing filter to add document metadata to the index.
|
org.apache.nutch.indexer.more |
A more indexing plugin, adds "more" index fields:
last modified date, MIME type, content length.
|
org.apache.nutch.indexer.replace |
Indexing filter to allow pattern replacements on metadata.
|
org.apache.nutch.indexer.staticfield |
A simple plugin called at indexing that adds fields with static data.
|
org.apache.nutch.indexer.subcollection |
Indexing filter to assign documents to subcollections.
|
org.apache.nutch.indexer.tld |
Top Level Domain Indexing plugin.
|
org.apache.nutch.indexer.urlmeta |
URL Meta Tag Indexing Plugin
|
org.apache.nutch.microformats.reltag |
A microformats Rel-Tag
Parser/Indexer/Querier plugin.
|
org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Modifier and Type | Class and Description |
---|---|
class |
LanguageIndexingFilter
An
IndexingFilter that add a
lang (language) field to the document. |
Modifier and Type | Class and Description |
---|---|
class |
Any23IndexingFilter
This implementation of
IndexingFilter
adds a triple(s) field to the NutchDocument . |
Modifier and Type | Class and Description |
---|---|
class |
AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text
for a document or deduplicate anchors.
|
Modifier and Type | Class and Description |
---|---|
class |
BasicIndexingFilter
Adds basic searchable fields to a document.
|
Modifier and Type | Class and Description |
---|---|
class |
FeedIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
MimeTypeIndexingFilter
An
IndexingFilter that allows filtering
of documents based on the MIME Type detected by Tika |
Modifier and Type | Class and Description |
---|---|
class |
GeoIPIndexingFilter
This plugin implements an indexing filter which takes advantage of the GeoIP2-java API.
|
Modifier and Type | Class and Description |
---|---|
class |
JexlIndexingFilter
An
IndexingFilter that allows filtering of
documents based on a JEXL expression. |
Modifier and Type | Class and Description |
---|---|
class |
LinksIndexingFilter
|
Modifier and Type | Class and Description |
---|---|
class |
MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse
metadata or content metadata.
|
Modifier and Type | Class and Description |
---|---|
class |
MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are
available), so that they can be accurately used within the search index.
|
Modifier and Type | Class and Description |
---|---|
class |
ReplaceIndexer
Do pattern replacements on selected field contents prior to indexing.
|
Modifier and Type | Class and Description |
---|---|
class |
StaticFieldIndexer
A simple plugin called at indexing that adds fields with static data.
|
Modifier and Type | Class and Description |
---|---|
class |
SubcollectionIndexingFilter |
Modifier and Type | Class and Description |
---|---|
class |
TLDIndexingFilter
Adds the Top level domain extensions to the index
|
Modifier and Type | Class and Description |
---|---|
class |
URLMetaIndexingFilter
This is part of the URL Meta plugin.
|
Modifier and Type | Class and Description |
---|---|
class |
RelTagIndexingFilter
An
IndexingFilter that add tag
field(s) to the document. |
Modifier and Type | Class and Description |
---|---|
class |
CCIndexingFilter
Adds basic searchable fields to a document.
|
Copyright © 2021 The Apache Software Foundation