misc: Update README.md.

vxern · Jan 9, 2023 · bf0d018 · bf0d018
1 parent 6b8913a
commit bf0d018
Show file tree

Hide file tree

Showing 6 changed files with 106 additions and 13 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,7 @@
+## 2.1.0+1
+
+- Updated README.md.
+
 ## 2.1.0
 
 - Added a method `.validate()` for validating files.

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 ### Usage
 
-The following code gets the `robots.txt` robot exclusion ruleset of a website.
+You can obtain the robot exclusion rulesets for a particular website as follows:
 
 ```dart
 // Get the contents of the `robots.txt` file.
@@ -11,13 +11,102 @@ final contents = /* Your method of obtaining the contents of a `robots.txt` file
 final robots = Robots.parse(contents);
 ```
 
-Now that the `robots.txt` file has been read, we can verify whether we can visit
-a certain path or not:
+Now that you have parsed the `robots.txt` file, you can perform checks to
+establish whether or not a user-agent is allowed to visit a particular path:
 
 ```dart
-final userAgent = /* Your user agent. */;
-// False: it cannot.
-print(robots.verifyCanAccess('/gist/', userAgent: userAgent));
-// True: it can.
-print(robots.verifyCanAccess('/wordcollector/robots_txt', userAgent: userAgent));
+final userAgent = /* Your user-agent. */;
+print(robots.verifyCanAccess('/gist/', userAgent: userAgent)); // False
+print(robots.verifyCanAccess('/wordcollector/robots_txt/', userAgent: userAgent)); // True
+```
+
+If you are not concerned about rules pertaining to any other user-agents, and we
+only care about our own, you may instruct the parser to ignore them by
+specifying only those that matter to us:
+
+```dart
+// Parse the contents, disregarding user-agents other than 'WordCollector'.
+final robots = Robots.parse(contents, onlyApplicableTo: const {'WordCollector'});
+```
+
+The `Robots.parse()` function does not have any built-in structure validation.
+It will not throw exceptions, and will fail silently wherever appropriate. If
+the file contents passed into it were not a valid `robots.txt` file, there is no
+guarantee that it will produce useful data, and disallow a bot wherever
+possible.
+
+If you wish to ensure before parsing that a particular file is valid, use the
+`Robots.validate()` function. Unlike `Robots.parse()`, this one **will throw** a
+`FormatException` if the file is not valid:
+
+```dart
+// Validating an invalid file will throw a `FormatException`.
+try {
+  Robots.validate('This is an obviously invalid robots.txt file.');
+} on FormatException {
+  print('As expected, this file is flagged as invalid.');
+}
+
+// Validating an already valid file will not throw anything.
+try {
+  Robots.validate('''
+User-agent: *
+Disallow: /
+Allow: /file.txt
+
+Sitemap: https://example.com/sitemap.xml
+''');
+  print('As expected also, this file is not flagged as invalid.');
+} on FormatException {
+  // Code to handle an invalid file.
+}
+```
+
+By default, the validator will only accept the following fields:
+
+- User-agent
+- Allow
+- Disallow
+- Sitemap
+
+If you want to accept files that feature any other fields, such as `Crawl-delay`
+or `Host`, you will have to specify them as so:
+
+```dart
+try {
+  Robots.validate(
+    '''
+User-agent: *
+Crawl-delay: 5
+''',
+    allowedFieldNames: {'Crawl-delay'},
+  );
+} on FormatException {
+  // Code to handle an invalid file.
+}
+```
+
+By default, the `Allow` field is considered to have precedence by the parser.
+This is the standard approach to both writing and reading `robots.txt` files,
+however, you can instruct the parser to follow another approach by telling it to
+do so:
+
+```dart
+robots.verifyCanAccess(
+  '/path', 
+  userAgent: userAgent, 
+  typePrecedence: RuleTypePrecedence.disallow,
+);
+```
+
+Similarly, fields defined **later** in the file are considered to have
+precedence too. Similarly also, this is the standard approach. You can instruct
+the parser to rule otherwise:
+
+```dart
+robots.verifyCanAccess(
+  '/path',
+  userAgent: userAgent,
+  comparisonMethod: PrecedenceStrategy.lowerTakesPrecedence,
+);
 ```
diff --git a/example/parse_example.dart b/example/parse_example.dart
@@ -33,7 +33,7 @@ Future<void> main() async {
     }
   }
 
-  const userAgent = 'wordcollector';
+  const userAgent = 'WordCollector';
 
   // False: it cannot.
   print(

diff --git a/example/validate_example.dart b/example/validate_example.dart
@@ -3,12 +3,12 @@ import 'package:robots_txt/robots_txt.dart';
 Future<void> main() async {
   // Validating an invalid file will throw a `FormatException`.
   try {
-    Robots.validate('This is obviously an invalid robots.txt file.');
+    Robots.validate('This is an obviously invalid robots.txt file.');
   } on FormatException {
     print('As expected, the first file is flagged as invalid.');
   }
 
-  // Validating an already valid file.
+  // Validating an already valid file will not throw anything.
   try {
     Robots.validate('''
 User-agent: *

diff --git a/pubspec.yaml b/pubspec.yaml
@@ -1,5 +1,5 @@
 name: robots_txt
-version: 2.1.0
+version: 2.1.0+1
 
 description: A complete, dependency-less and fully documented `robots.txt` ruleset parser.
 

diff --git a/test/tests/parser_test.dart b/test/tests/parser_test.dart
@@ -49,7 +49,7 @@ void main() {
       });
     });
 
-    group('rules with logical applicability', () {
+    group('rules with logic-based applicability', () {
       test('defined without a user agent.', () {
         expect(
           () => robots = Robots.parse(rulesWithoutUserAgent),
-Original file line number
+Diff line change
@@ Expand Up / @@ -33,7 +33,7 @@ Future<void> main() async { @@
         }
       }
-      const userAgent = 'wordcollector';
+      const userAgent = 'WordCollector';
       // False: it cannot.
       print(
@@ Expand Down @@