Skip to content

Latest commit

 

History

History
71 lines (57 loc) · 1.75 KB

README.md

File metadata and controls

71 lines (57 loc) · 1.75 KB

Java JitPack License

Robots.txt

Java library for reading and querying robots.txt files.

Using the library in Kotlin

  1. Parse robots.txt:
val robotsTxt = RobotsTxtReader.read(inputStream)
  1. Query robotsTxt:
val grant = robotsTxt.query("GoogleBot", "/path")
val canAccess = grant.allowed
when(grant) {
    is MatchedGrant -> {
        val crawlDelay = grant.matchedRuleGroup.crawlDelay
    }
    is NonMatchedAllowedGrant -> {
        TODO("Not matched in robots.txt")
    }
}

Using the library in Java

  1. Parse robots.txt:
RobotsTxt robotsTxt = RobotsTxtReader.read(inputStream);
  1. Query robotsTxt:
Grant grant = robotsTxt.query("GoogleBot", "/path");
boolean canAccess = grant.getAllowed();
if (grant instanceof MatchedGrant) {
  Duration crawlDelay = ((MatchedGrant) grant).getMatchedRuleGroup().getCrawlDelay();
}

Importing into your project

Maven

Add the JitPack repository into your pom.xml.

<repositories>
  <repository>
    <id>jitpack.io</id>
    <url>https://jitpack.io</url>
  </repository>
</repositories>

Add the following under your <dependencies>:

<dependencies>
  <dependency>
    <groupId>com.github.alturkovic</groupId>
    <artifactId>robots-txt</artifactId>
    <version>[insert latest version here]</version>
  </dependency>
</dependencies>