Package | Description |
---|---|
org.apache.nutch.scoring |
The
ScoringFilter interface. |
org.apache.nutch.scoring.depth |
Scoring filter to stop crawling at a configurable depth
(number of "hops" from seed URLs).
|
org.apache.nutch.scoring.link |
Scoring filter used in conjunction with
WebGraph . |
org.apache.nutch.scoring.opic |
Scoring filter implementing a variant of the Online Page Importance Computation
(OPIC) algorithm.
|
org.apache.nutch.scoring.orphan |
Scoring filter to modify score or status of orphaned pages (no inlinks found
for a configurable amount of time).
|
org.apache.nutch.scoring.similarity | |
org.apache.nutch.scoring.tld |
Top Level Domain Scoring plugin.
|
org.apache.nutch.scoring.urlmeta |
URL Meta Tag Scoring Plugin
|
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
ScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Distribute score value from the current page to all its outlinked pages.
|
CrawlDatum |
ScoringFilters.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
CrawlDatum |
AbstractScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
float |
ScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
This method prepares a sort value for the purpose of sorting and selecting
top N scoring pages during fetchlist generation.
|
float |
ScoringFilters.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
Calculate a sort value for Generate.
|
float |
AbstractScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort) |
float |
ScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a indexed document score/boost.
|
float |
ScoringFilters.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) |
float |
AbstractScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) |
void |
ScoringFilter.initialScore(Text url,
CrawlDatum datum)
Set an initial score for newly discovered pages.
|
void |
ScoringFilters.initialScore(Text url,
CrawlDatum datum)
Calculate a new initial score, used when adding newly discovered pages.
|
void |
AbstractScoringFilter.initialScore(Text url,
CrawlDatum datum) |
void |
ScoringFilter.injectedScore(Text url,
CrawlDatum datum)
Set an initial score for newly injected pages.
|
void |
ScoringFilters.injectedScore(Text url,
CrawlDatum datum)
Calculate a new initial score, used when injecting new pages.
|
void |
AbstractScoringFilter.injectedScore(Text url,
CrawlDatum datum) |
default void |
ScoringFilter.orphanedScore(Text url,
CrawlDatum datum)
This method may change the score or status of CrawlDatum during CrawlDb
update, when the URL is neither fetched nor has any inlinks.
|
void |
ScoringFilters.orphanedScore(Text url,
CrawlDatum datum)
Calculate orphaned page score during CrawlDb.update().
|
void |
ScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming
from the parsing process.
|
void |
ScoringFilters.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
void |
AbstractScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
void |
ScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
This method takes all relevant score information from the current datum
(coming from a generated fetchlist) and stores it into
Content metadata. |
void |
ScoringFilters.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) |
void |
AbstractScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) |
void |
ScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
This method calculates a new score of CrawlDatum during CrawlDb update,
based on the initial value of the original CrawlDatum, and also score
values contributed by inlinked pages.
|
void |
ScoringFilters.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
Calculate updated page score during CrawlDb.update().
|
void |
AbstractScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked) |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
DepthScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
float |
DepthScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort) |
float |
DepthScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) |
void |
DepthScoringFilter.initialScore(Text url,
CrawlDatum datum) |
void |
DepthScoringFilter.injectedScore(Text url,
CrawlDatum datum) |
void |
DepthScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
void |
DepthScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) |
void |
DepthScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked) |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
LinkAnalysisScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
float |
LinkAnalysisScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort) |
float |
LinkAnalysisScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) |
void |
LinkAnalysisScoringFilter.initialScore(Text url,
CrawlDatum datum) |
void |
LinkAnalysisScoringFilter.injectedScore(Text url,
CrawlDatum datum) |
void |
LinkAnalysisScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
void |
LinkAnalysisScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) |
void |
LinkAnalysisScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked) |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of
outlinks and apply.
|
float |
OPICScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
|
float |
OPICScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower.
|
void |
OPICScoringFilter.initialScore(Text url,
CrawlDatum datum)
Set to 0.0f (unknown value) - inlink contributions will bring it to a
correct level.
|
void |
OPICScoringFilter.injectedScore(Text url,
CrawlDatum datum) |
void |
OPICScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
Increase the score by a sum of inlinked scores.
|
Modifier and Type | Method and Description |
---|---|
void |
OrphanScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinks)
Used for orphan control.
|
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
SimilarityScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
void |
SimilarityScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
TLDScoringFilter.distributeScoreToOutlink(Text fromUrl,
Text toUrl,
ParseData parseData,
CrawlDatum target,
CrawlDatum adjust,
int allCount,
int validCount) |
CrawlDatum |
TLDScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount) |
float |
TLDScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort) |
float |
TLDScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) |
void |
TLDScoringFilter.initialScore(Text url,
CrawlDatum datum) |
void |
TLDScoringFilter.injectedScore(Text url,
CrawlDatum datum) |
void |
TLDScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse) |
void |
TLDScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) |
void |
TLDScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked) |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
URLMetaScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
This will take the metatags that you have listed in your "urlmeta.tags"
property, and looks for them inside the parseData object.
|
float |
URLMetaScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
Boilerplate
|
float |
URLMetaScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Boilerplate
|
void |
URLMetaScoringFilter.initialScore(Text url,
CrawlDatum datum)
Boilerplate
|
void |
URLMetaScoringFilter.injectedScore(Text url,
CrawlDatum datum)
Boilerplate
|
void |
URLMetaScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
Boilerplate
|
Copyright © 2021 The Apache Software Foundation