1- # Robots Parser [ ![ DeepScan grade] ( https://deepscan.io/api/teams/457/projects/16277/branches/344939/badge/grade.svg )] ( https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939 ) [ ![ GitHub license] ( https://img.shields.io/github/license/samclarke/robots-parser.svg )] ( https://github.com/samclarke/robots-parser/blob/master/license.md ) [ ![ Coverage Status] ( https://coveralls.io/repos/github/samclarke/robots-parser/badge.svg?branch=master )] ( https://coveralls.io/github/samclarke/robots-parser?branch=master )
1+ # Robots Parser [ ![ NPM downloads ] ( https://img.shields.io/npm/dm/robots-parser )] ( https://www.npmjs.com/package/robots-parser ) [ ![ DeepScan grade] ( https://deepscan.io/api/teams/457/projects/16277/branches/344939/badge/grade.svg )] ( https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939 ) [ ![ GitHub license] ( https://img.shields.io/github/license/samclarke/robots-parser.svg )] ( https://github.com/samclarke/robots-parser/blob/master/license.md ) [ ![ Coverage Status] ( https://coveralls.io/repos/github/samclarke/robots-parser/badge.svg?branch=master )] ( https://coveralls.io/github/samclarke/robots-parser?branch=master )
22
33NodeJS robots.txt parser.
44
55It currently supports:
66
7- * User-agent:
8- * Allow:
9- * Disallow:
10- * Sitemap:
11- * Crawl-delay:
12- * Host:
13- * Paths with wildcards (* ) and EOL matching ($)
7+ - User-agent:
8+ - Allow:
9+ - Disallow:
10+ - Sitemap:
11+ - Crawl-delay:
12+ - Host:
13+ - Paths with wildcards (\ * ) and EOL matching ($)
1414
1515## Installation
1616
@@ -27,16 +27,19 @@ or via Yarn:
2727``` js
2828var robotsParser = require (' robots-parser' );
2929
30- var robots = robotsParser (' http://www.example.com/robots.txt' , [
31- ' User-agent: *' ,
32- ' Disallow: /dir/' ,
33- ' Disallow: /test.html' ,
34- ' Allow: /dir/test.html' ,
35- ' Allow: /test.html' ,
36- ' Crawl-delay: 1' ,
37- ' Sitemap: http://example.com/sitemap.xml' ,
38- ' Host: example.com'
39- ].join (' \n ' ));
30+ var robots = robotsParser (
31+ ' http://www.example.com/robots.txt' ,
32+ [
33+ ' User-agent: *' ,
34+ ' Disallow: /dir/' ,
35+ ' Disallow: /test.html' ,
36+ ' Allow: /dir/test.html' ,
37+ ' Allow: /test.html' ,
38+ ' Crawl-delay: 1' ,
39+ ' Sitemap: http://example.com/sitemap.xml' ,
40+ ' Host: example.com'
41+ ].join (' \n ' )
42+ );
4043
4144robots .isAllowed (' http://www.example.com/test.html' , ' Sams-Bot/1.0' ); // true
4245robots .isAllowed (' http://www.example.com/dir/test.html' , ' Sams-Bot/1.0' ); // true
@@ -46,24 +49,24 @@ robots.getSitemaps(); // ['http://example.com/sitemap.xml']
4649robots .getPreferredHost (); // example.com
4750```
4851
49-
5052### isAllowed(url, [ ua] )
53+
5154** boolean or undefined**
5255
5356Returns true if crawling the specified URL is allowed for the specified user-agent.
5457
5558This will return ` undefined ` if the URL isn't valid for this robots.txt.
5659
57-
5860### isDisallowed(url, [ ua] )
61+
5962** boolean or undefined**
6063
6164Returns true if crawling the specified URL is not allowed for the specified user-agent.
6265
6366This will return ` undefined ` if the URL isn't valid for this robots.txt.
6467
65-
6668### getMatchingLineNumber(url, [ ua] )
69+
6770** number or undefined**
6871
6972Returns the line number of the matching directive for the specified URL and user-agent if any.
@@ -72,115 +75,116 @@ Line numbers start at 1 and go up (1-based indexing).
7275
7376Returns -1 if there is no matching directive. If a rule is manually added without a lineNumber then this will return undefined for that rule.
7477
75-
7678### getCrawlDelay([ ua] )
79+
7780** number or undefined**
7881
7982Returns the number of seconds the specified user-agent should wait between requests.
8083
8184Returns undefined if no crawl delay has been specified for this user-agent.
8285
83-
8486### getSitemaps()
87+
8588** array**
8689
8790Returns an array of sitemap URLs specified by the ` sitemap: ` directive.
8891
89-
9092### getPreferredHost()
93+
9194** string or null**
9295
9396Returns the preferred host name specified by the ` host: ` directive or null if there isn't one.
9497
95-
9698# Changes
9799
98100### Version 2.3.0:
99101
100- * Fixed bug where if the user-agent passed to ` isAllowed() ` / ` isDisallowed() ` is called "constructor" it would throw an error.
101- * Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
102-
103- Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
102+ - Fixed bug where if the user-agent passed to ` isAllowed() ` / ` isDisallowed() ` is called "constructor" it would throw an error.
103+ - Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
104+
105+ Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
104106
105- For example:
106- ``` js
107- var robots = robotsParser (' /robots.txt' , [
108- ' User-agent: *' ,
109- ' Disallow: /dir/' ,
110- ' Disallow: /test.html' ,
111- ' Allow: /dir/test.html' ,
112- ' Allow: /test.html'
113- ].join (' \n ' ));
107+ For example:
114108
115- robots .isAllowed (' /test.html' , ' Sams-Bot/1.0' ); // false
116- robots .isAllowed (' /dir/test.html' , ' Sams-Bot/1.0' ); // true
117- robots .isDisallowed (' /dir/test2.html' , ' Sams-Bot/1.0' ); // true
118- ```
109+ ``` js
110+ var robots = robotsParser (
111+ ' /robots.txt' ,
112+ [
113+ ' User-agent: *' ,
114+ ' Disallow: /dir/' ,
115+ ' Disallow: /test.html' ,
116+ ' Allow: /dir/test.html' ,
117+ ' Allow: /test.html'
118+ ].join (' \n ' )
119+ );
119120
121+ robots .isAllowed (' /test.html' , ' Sams-Bot/1.0' ); // false
122+ robots .isAllowed (' /dir/test.html' , ' Sams-Bot/1.0' ); // true
123+ robots .isDisallowed (' /dir/test2.html' , ' Sams-Bot/1.0' ); // true
124+ ```
120125
121126### Version 2.2 .0 :
122127
123- * Fixed bug that with matching wildcard patterns with some URLs
124- &ndash ; Thanks to @ckylape for reporting and fixing
125- * Changed matching algorithm to match Google's implementation in google/robotstxt
126- * Changed order of precedence to match current spec
128+ - Fixed bug that with matching wildcard patterns with some URLs
129+ & ndash; Thanks to @ckylape for reporting and fixing
130+ - Changed matching algorithm to match Google' s implementation in google/robotstxt
131+ - Changed order of precedence to match current spec
127132
128133### Version 2.1.1:
129134
130- * Fix bug that could be used to causing rule checking to take a long time
131- &ndash ; Thanks to @andeanfog
135+ - Fix bug that could be used to causing rule checking to take a long time
136+ – Thanks to @andeanfog
132137
133138### Version 2.1.0:
134139
135- * Removed use of punycode module API's as new URL API handles it
136- * Improved test coverage
137- * Added tests for percent encoded paths and improved support
138- * Added ` getMatchingLineNumber() ` method
139- * Fixed bug with comments on same line as directive
140+ - Removed use of punycode module API' s as new URL API handles it
141+ - Improved test coverage
142+ - Added tests for percent encoded paths and improved support
143+ - Added ` getMatchingLineNumber()` method
144+ - Fixed bug with comments on same line as directive
140145
141146### Version 2.0 .0 :
142147
143148This release is not 100 % backwards compatible as it now uses the new URL APIs which are not supported in Node < 7.
144149
145- * Update code to not use deprecated URL module API's.
146- &ndash ; Thanks to @kdzwinel
150+ - Update code to not use deprecated URL module API ' s.
151+ – Thanks to @kdzwinel
147152
148153### Version 1.0.2:
149154
150- * Fixed error caused by invalid URLs missing the protocol.
155+ - Fixed error caused by invalid URLs missing the protocol.
151156
152157### Version 1.0.1:
153158
154- * Fixed bug with the "user-agent" rule being treated as case sensitive.
155- &ndash ; Thanks to @brendonboshell
156- * Improved test coverage.
157- &ndash ; Thanks to @schornio
159+ - Fixed bug with the "user-agent" rule being treated as case sensitive.
160+ – Thanks to @brendonboshell
161+ - Improved test coverage.
162+ – Thanks to @schornio
158163
159164### Version 1.0.0:
160165
161- * Initial release.
162-
166+ - Initial release.
163167
164168# License
165169
166- The MIT License (MIT)
170+ The MIT License (MIT)
167171
168- Copyright (c) 2014 Sam Clarke
172+ Copyright (c) 2014 Sam Clarke
169173
170- Permission is hereby granted, free of charge, to any person obtaining a copy
171- of this software and associated documentation files (the "Software"), to deal
172- in the Software without restriction, including without limitation the rights
173- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
174- copies of the Software, and to permit persons to whom the Software is
175- furnished to do so, subject to the following conditions:
174+ Permission is hereby granted, free of charge, to any person obtaining a copy
175+ of this software and associated documentation files (the "Software"), to deal
176+ in the Software without restriction, including without limitation the rights
177+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
178+ copies of the Software, and to permit persons to whom the Software is
179+ furnished to do so, subject to the following conditions:
176180
177- The above copyright notice and this permission notice shall be included in
178- all copies or substantial portions of the Software.
181+ The above copyright notice and this permission notice shall be included in
182+ all copies or substantial portions of the Software.
179183
180- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
181- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
182- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
183- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
184- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
185- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
186- THE SOFTWARE.
184+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
185+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
186+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
187+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
188+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
189+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
190+ THE SOFTWARE.
0 commit comments