public final class MimeUtil extends Object
Constructor and Description |
---|
MimeUtil(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
String |
autoResolveContentType(String typeName,
String url,
byte[] data)
A facade interface to trying all the possible mime type resolution
strategies available within Tika.
|
static String |
cleanMimeType(String origType)
Cleans a
MimeType name by removing out the actual MimeType ,
from a string of the form: |
String |
forName(String name)
A facade interface to Tika's underlying
MimeTypes.forName(String)
method. |
String |
getMimeType(File f)
Facade interface to Tika's underlying
MimeTypes.getMimeType(File)
method. |
String |
getMimeType(String url)
Facade interface to Tika's underlying
MimeTypes.getMimeType(String)
method. |
static void |
setPoolSize(int poolSize) |
public MimeUtil(Configuration conf)
public static void setPoolSize(int poolSize)
public static String cleanMimeType(String origType)
MimeType
name by removing out the actual MimeType
,
from a string of the form:
<primary type>/<sub type> ; < optional params
origType
- The original mime type string to be cleaned.public String autoResolveContentType(String typeName, String url, byte[] data)
typeName
is cleaned, with cleanMimeType(String)
. Then
the cleaned mime type is looked up in the underlying Tika MimeTypes
registry, by its cleaned name. If the MimeType
is found, then that
mime type is used, otherwise URL resolution is used to try and determine
the mime type. However, if mime.type.magic
is enabled in
NutchConfiguration
, then mime type magic resolution is used to try
and obtain a better-than-the-default approximation of the MimeType
.typeName
- The original mime type, returned from a ProtocolOutput
.url
- The given @see url, that Nutch was trying to crawl.data
- The byte data, returned from the crawl, if any.MimeType
name.public String getMimeType(String url)
MimeTypes.getMimeType(String)
method.url
- A string representation of the document URL to sense the
MimeType
for.MimeType
, identified from the given Document
url in string form.public String forName(String name)
MimeTypes.forName(String)
method.name
- The name of a valid MimeType
in the Tika mime registry.MimeType
, if it exists, or
null otherwise.Copyright © 2021 The Apache Software Foundation