Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair parsing error due to commented-out field token #59

Merged
merged 1 commit into from
Aug 19, 2020

Conversation

gittaca
Copy link
Contributor

@gittaca gittaca commented Jul 31, 2020

When a line starts with # but continues with one of the valid fields / tokens, this backtrace is produced:

Error: Test failed: 'Commented-out tokens get parsed correctly'
* 'names' attribute [2] must be the same length as the vector [0]
Backtrace:
 1. testthat::expect_true(...)
 5. robotstxt::parse_robotstxt(rtxt_ct)
 6. robotstxt::rt_get_fields(txt, "allow") …robotstxtRparse_robotstxt.R:7:2
 7. base::lapply(...) …robotstxtRrt_get_fields.R:38:2
 8. robotstxt:::FUN(X[[i]], ...)

After removing the # in this demo .txt, the test case will fail with nrow(parse_robotstxt(rtxt_ct)$permissions) == 1 isn't true. which I take that the error is avoided.

Maybe the error results from somewhere in rt_get_fields.R? Maybe because one of the reg-exes detects the valid token ([2]), but another detects the line as a comment ([0]), thus clashing at names(fields) <- c("field", "value")?

I've tinkered with the various reg-exes there for about an hour, but found no solution. Will instead remove the problematic file from my analysis to continue with that for now.

Please feel free to take over the PR. I hope the test is a useful start to find and fix the problem.

'names' attribute [2] must be the same length as the vector [0]
@petermeissner
Copy link
Contributor

petermeissner commented Aug 2, 2020

Thanks, can you open an issue for that please.

@petermeissner
Copy link
Contributor

@gittaca also, can you give a reproducible example, please.

@gittaca
Copy link
Contributor Author

gittaca commented Aug 5, 2020

I included a inst/robotstxts/robots_commented_token.txt in this PR ;-)

@petermeissner petermeissner changed the base branch from master to pr59 August 19, 2020 19:37
@petermeissner petermeissner merged commit 8391b6a into ropensci:pr59 Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants