Java library for reading and querying robots.txt files.
- Parse
robots.txt
:
val robotsTxt = RobotsTxtReader.read(inputStream)
- Query
robotsTxt
:
val grant = robotsTxt.query("GoogleBot", "/path")
val canAccess = grant.allowed
when(grant) {
is MatchedGrant -> {
val crawlDelay = grant.matchedRuleGroup.crawlDelay
}
is NonMatchedAllowedGrant -> {
TODO("Not matched in robots.txt")
}
}
- Parse
robots.txt
:
RobotsTxt robotsTxt = RobotsTxtReader.read(inputStream);
- Query
robotsTxt
:
Grant grant = robotsTxt.query("GoogleBot", "/path");
boolean canAccess = grant.getAllowed();
if (grant instanceof MatchedGrant) {
Duration crawlDelay = ((MatchedGrant) grant).getMatchedRuleGroup().getCrawlDelay();
}
Add the JitPack repository into your pom.xml
.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add the following under your <dependencies>
:
<dependencies>
<dependency>
<groupId>com.github.alturkovic</groupId>
<artifactId>robots-txt</artifactId>
<version>[insert latest version here]</version>
</dependency>
</dependencies>