public class SimilarityScoringFilter extends AbstractScoringFilter
X_POINT_ID
Constructor and Description |
---|
SimilarityScoringFilter() |
Modifier and Type | Method and Description |
---|---|
CrawlDatum |
distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Distribute score value from the current page to all its outlinked pages.
|
Configuration |
getConf() |
void |
passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming
from the parsing process.
|
void |
setConf(Configuration conf) |
generatorSortValue, indexerScore, initialScore, injectedScore, passScoreBeforeParsing, updateDbScore
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
orphanedScore
public Configuration getConf()
getConf
in interface Configurable
getConf
in class AbstractScoringFilter
public void setConf(Configuration conf)
setConf
in interface Configurable
setConf
in class AbstractScoringFilter
public void passScoreAfterParsing(Text url, Content content, Parse parse) throws ScoringFilterException
ScoringFilter
passScoreAfterParsing
in interface ScoringFilter
passScoreAfterParsing
in class AbstractScoringFilter
url
- page urlcontent
- original content. NOTE: modifications to this value are not
persisted.parse
- target instance to copy the score information to. Implementations
may modify this in-place, primarily by setting some metadata
properties.ScoringFilterException
public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException
ScoringFilter
distributeScoreToOutlinks
in interface ScoringFilter
distributeScoreToOutlinks
in class AbstractScoringFilter
fromUrl
- url of the source pageparseData
- ParseData instance, which stores relevant score value(s) in its
metadata. NOTE: filters may modify this in-place, all changes will
be persisted.targets
- <url, CrawlDatum> pairs. NOTE: filters can modify this
in-place, all changes will be persisted.adjust
- a CrawlDatum instance, initially null, which implementations may
use to pass adjustment values to the original CrawlDatum. When
creating this instance, set its status to
CrawlDatum.STATUS_LINKED
.allCount
- number of all collected outlinks from the source pageCrawlDatum.STATUS_LINKED
, which contains
adjustments to be applied to the original CrawlDatum score(s) and
metadata. This can be null if not needed.ScoringFilterException
Copyright © 2021 The Apache Software Foundation