Package | Description |
---|---|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.ext |
Parse wrapper to run external command to do the parsing.
|
org.apache.nutch.parse.feed |
Parse RSS feeds.
|
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.js |
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
|
org.apache.nutch.parse.swf |
Parse Flash SWF files.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
org.apache.nutch.parse.zip |
Parse ZIP files: embedded files are recursively passed to appropriate parsers.
|
Modifier and Type | Method and Description |
---|---|
Parser |
ParserFactory.getParserById(String id)
Function returns a
Parser instance with the specified
extId , representing its extension ID. |
Parser[] |
ParserFactory.getParsers(String contentType,
String url)
Function returns an array of
Parser s for a given content type. |
Modifier and Type | Class and Description |
---|---|
class |
ExtParser
A wrapper that invokes external command to do real parsing job.
|
Modifier and Type | Class and Description |
---|---|
class |
FeedParser |
Modifier and Type | Class and Description |
---|---|
class |
HtmlParser |
Modifier and Type | Class and Description |
---|---|
class |
JSParseFilter
This class is a heuristic link extractor for JavaScript files and code
snippets.
|
Modifier and Type | Class and Description |
---|---|
class |
SWFParser
Parser for Flash SWF files.
|
Modifier and Type | Class and Description |
---|---|
class |
TikaParser
Wrapper for Tika parsers.
|
Modifier and Type | Class and Description |
---|---|
class |
ZipParser
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
|
Copyright © 2021 The Apache Software Foundation