Skip to content

Commit

Permalink
misc: Update README.md.
Browse files Browse the repository at this point in the history
  • Loading branch information
vxern committed Jan 9, 2023
1 parent 6b8913a commit bf0d018
Show file tree
Hide file tree
Showing 6 changed files with 106 additions and 13 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 2.1.0+1

- Updated README.md.

## 2.1.0

- Added a method `.validate()` for validating files.
Expand Down
105 changes: 97 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

### Usage

The following code gets the `robots.txt` robot exclusion ruleset of a website.
You can obtain the robot exclusion rulesets for a particular website as follows:

```dart
// Get the contents of the `robots.txt` file.
Expand All @@ -11,13 +11,102 @@ final contents = /* Your method of obtaining the contents of a `robots.txt` file
final robots = Robots.parse(contents);
```

Now that the `robots.txt` file has been read, we can verify whether we can visit
a certain path or not:
Now that you have parsed the `robots.txt` file, you can perform checks to
establish whether or not a user-agent is allowed to visit a particular path:

```dart
final userAgent = /* Your user agent. */;
// False: it cannot.
print(robots.verifyCanAccess('/gist/', userAgent: userAgent));
// True: it can.
print(robots.verifyCanAccess('/wordcollector/robots_txt', userAgent: userAgent));
final userAgent = /* Your user-agent. */;
print(robots.verifyCanAccess('/gist/', userAgent: userAgent)); // False
print(robots.verifyCanAccess('/wordcollector/robots_txt/', userAgent: userAgent)); // True
```

If you are not concerned about rules pertaining to any other user-agents, and we
only care about our own, you may instruct the parser to ignore them by
specifying only those that matter to us:

```dart
// Parse the contents, disregarding user-agents other than 'WordCollector'.
final robots = Robots.parse(contents, onlyApplicableTo: const {'WordCollector'});
```

The `Robots.parse()` function does not have any built-in structure validation.
It will not throw exceptions, and will fail silently wherever appropriate. If
the file contents passed into it were not a valid `robots.txt` file, there is no
guarantee that it will produce useful data, and disallow a bot wherever
possible.

If you wish to ensure before parsing that a particular file is valid, use the
`Robots.validate()` function. Unlike `Robots.parse()`, this one **will throw** a
`FormatException` if the file is not valid:

```dart
// Validating an invalid file will throw a `FormatException`.
try {
Robots.validate('This is an obviously invalid robots.txt file.');
} on FormatException {
print('As expected, this file is flagged as invalid.');
}
// Validating an already valid file will not throw anything.
try {
Robots.validate('''
User-agent: *
Disallow: /
Allow: /file.txt
Sitemap: https://example.com/sitemap.xml
''');
print('As expected also, this file is not flagged as invalid.');
} on FormatException {
// Code to handle an invalid file.
}
```

By default, the validator will only accept the following fields:

- User-agent
- Allow
- Disallow
- Sitemap

If you want to accept files that feature any other fields, such as `Crawl-delay`
or `Host`, you will have to specify them as so:

```dart
try {
Robots.validate(
'''
User-agent: *
Crawl-delay: 5
''',
allowedFieldNames: {'Crawl-delay'},
);
} on FormatException {
// Code to handle an invalid file.
}
```

By default, the `Allow` field is considered to have precedence by the parser.
This is the standard approach to both writing and reading `robots.txt` files,
however, you can instruct the parser to follow another approach by telling it to
do so:

```dart
robots.verifyCanAccess(
'/path',
userAgent: userAgent,
typePrecedence: RuleTypePrecedence.disallow,
);
```

Similarly, fields defined **later** in the file are considered to have
precedence too. Similarly also, this is the standard approach. You can instruct
the parser to rule otherwise:

```dart
robots.verifyCanAccess(
'/path',
userAgent: userAgent,
comparisonMethod: PrecedenceStrategy.lowerTakesPrecedence,
);
```
2 changes: 1 addition & 1 deletion example/parse_example.dart
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Future<void> main() async {
}
}

const userAgent = 'wordcollector';
const userAgent = 'WordCollector';

// False: it cannot.
print(
Expand Down
4 changes: 2 additions & 2 deletions example/validate_example.dart
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ import 'package:robots_txt/robots_txt.dart';
Future<void> main() async {
// Validating an invalid file will throw a `FormatException`.
try {
Robots.validate('This is obviously an invalid robots.txt file.');
Robots.validate('This is an obviously invalid robots.txt file.');
} on FormatException {
print('As expected, the first file is flagged as invalid.');
}

// Validating an already valid file.
// Validating an already valid file will not throw anything.
try {
Robots.validate('''
User-agent: *
Expand Down
2 changes: 1 addition & 1 deletion pubspec.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: robots_txt
version: 2.1.0
version: 2.1.0+1

description: A complete, dependency-less and fully documented `robots.txt` ruleset parser.

Expand Down
2 changes: 1 addition & 1 deletion test/tests/parser_test.dart
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ void main() {
});
});

group('rules with logical applicability', () {
group('rules with logic-based applicability', () {
test('defined without a user agent.', () {
expect(
() => robots = Robots.parse(rulesWithoutUserAgent),
Expand Down

0 comments on commit bf0d018

Please sign in to comment.