public class FtpRobotRulesParser extends RobotRulesParser
RobotRulesParser
class and contains Ftp protocol
specific implementation for obtaining the robots file.agentNames, CACHE, conf, EMPTY_RULES, FORBID_ALL_RULES, whiteList
Constructor and Description |
---|
FtpRobotRulesParser(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
crawlercommons.robots.BaseRobotRules |
getRobotRulesSet(Protocol ftp,
URL url,
List<Content> robotsTxtContent)
The hosts for which the caching of robots rules is yet to be done, it sends
a Ftp request to the host corresponding to the
URL passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future. |
getConf, getRobotRulesSet, isWhiteListed, main, parseRules, run, setConf
public FtpRobotRulesParser(Configuration conf)
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp, URL url, List<Content> robotsTxtContent)
URL
passed, gets
robots file, parses the rules and caches the rules object to avoid re-work
in future.getRobotRulesSet
in class RobotRulesParser
ftp
- The Protocol
objecturl
- URLrobotsTxtContent
- container to store responses when fetching the robots.txt file for
debugging or archival purposes. Instead of a robots.txt file, it
may include redirects or an error page (404, etc.). Response
Content
is appended to the passed list. If null is passed
nothing is stored.BaseRobotRules
object for the rulesCopyright © 2021 The Apache Software Foundation