Repair parsing error due to commented-out field token #59

gittaca · 2020-07-31T15:37:44Z

When a line starts with # but continues with one of the valid fields / tokens, this backtrace is produced:

Error: Test failed: 'Commented-out tokens get parsed correctly'
* 'names' attribute [2] must be the same length as the vector [0]
Backtrace:
 1. testthat::expect_true(...)
 5. robotstxt::parse_robotstxt(rtxt_ct)
 6. robotstxt::rt_get_fields(txt, "allow") …robotstxtRparse_robotstxt.R:7:2
 7. base::lapply(...) …robotstxtRrt_get_fields.R:38:2
 8. robotstxt:::FUN(X[[i]], ...)

After removing the # in this demo .txt, the test case will fail with nrow(parse_robotstxt(rtxt_ct)$permissions) == 1 isn't true. which I take that the error is avoided.

Maybe the error results from somewhere in rt_get_fields.R? Maybe because one of the reg-exes detects the valid token ([2]), but another detects the line as a comment ([0]), thus clashing at names(fields) <- c("field", "value")?

I've tinkered with the various reg-exes there for about an hour, but found no solution. Will instead remove the problematic file from my analysis to continue with that for now.

Please feel free to take over the PR. I hope the test is a useful start to find and fix the problem.

'names' attribute [2] must be the same length as the vector [0]

petermeissner · 2020-08-02T20:24:59Z

Thanks, can you open an issue for that please.

petermeissner · 2020-08-05T07:39:48Z

@gittaca also, can you give a reproducible example, please.

gittaca · 2020-08-05T16:12:10Z

I included a inst/robotstxts/robots_commented_token.txt in this PR ;-)

Add regression test against error in rt_get_fields

c9eb21c

'names' attribute [2] must be the same length as the vector [0]

gittaca force-pushed the parse-commented-tokens branch from 188c301 to c9eb21c Compare July 31, 2020 15:46

petermeissner changed the base branch from master to pr59 August 19, 2020 19:37

petermeissner merged commit 8391b6a into ropensci:pr59 Aug 19, 2020

petermeissner mentioned this pull request Aug 19, 2020

Parsing would fail for comment in last line #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repair parsing error due to commented-out field token #59

Repair parsing error due to commented-out field token #59

gittaca commented Jul 31, 2020 •

edited

Loading

petermeissner commented Aug 2, 2020 •

edited

Loading

petermeissner commented Aug 5, 2020

gittaca commented Aug 5, 2020

Repair parsing error due to commented-out field token #59

Repair parsing error due to commented-out field token #59

Conversation

gittaca commented Jul 31, 2020 • edited Loading

petermeissner commented Aug 2, 2020 • edited Loading

petermeissner commented Aug 5, 2020

gittaca commented Aug 5, 2020

gittaca commented Jul 31, 2020 •

edited

Loading

petermeissner commented Aug 2, 2020 •

edited

Loading