public class FileDumper extends Object
The tool has a number of immediate uses:
Upon successful completion the tool displays a very convenient JSON snippet detailing the mimetype classifications and the counts of documents which fall into those classifications. An example is as follows:
INFO: File Types:
TOTAL Stats:
[
{"mimeType":"application/xml","count":"19"}
{"mimeType":"image/png","count":"47"}
{"mimeType":"image/jpeg","count":"141"}
{"mimeType":"image/vnd.microsoft.icon","count":"4"}
{"mimeType":"text/plain","count":"89"}
{"mimeType":"video/quicktime","count":"2"}
{"mimeType":"image/gif","count":"63"}
{"mimeType":"application/xhtml+xml","count":"1670"}
{"mimeType":"application/octet-stream","count":"40"}
{"mimeType":"text/html","count":"1863"}
]
FILTER Stats:
[
{"mimeType":"image/png","count":"47"}
{"mimeType":"image/jpeg","count":"141"}
{"mimeType":"image/vnd.microsoft.icon","count":"4"}
{"mimeType":"video/quicktime","count":"2"}
{"mimeType":"image/gif","count":"63"}
]
In the case above, the tool would have been run with the -mimeType image/png image/jpeg image/vnd.microsoft.icon video/quicktime image/gif flag and corresponding values activated.
Constructor and Description |
---|
FileDumper() |
Modifier and Type | Method and Description |
---|---|
void |
dump(File outputDir,
File segmentRootDir,
String[] mimeTypes,
boolean flatDir,
boolean mimeTypeStats,
boolean reverseURLDump)
Dumps the reverse engineered raw content from the provided segment
directories if a parent directory contains more than one segment, otherwise
a single segment can be passed as an argument.
|
static void |
main(String[] args)
Main method for invoking this tool
|
public void dump(File outputDir, File segmentRootDir, String[] mimeTypes, boolean flatDir, boolean mimeTypeStats, boolean reverseURLDump) throws Exception
outputDir
- the directory you wish to dump the raw content to. This directory
will be created.segmentRootDir
- a directory containing one or more segments.mimeTypes
- an array of mime types we have to dump, all others will be
filtered out.flatDir
- a boolean flag specifying whether the output directory should contain
only files instead of using nested directories to prevent naming
conflicts.mimeTypeStats
- a flag indicating whether mimetype stats should be displayed
instead of dumping files.Exception
Copyright © 2021 The Apache Software Foundation