-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line numbers in error messages should refer to the raw file #2428
Comments
Unfortunately it gets worse with the large file: during multi-chunk reading each chunk knows how to find the beginning of a "next good line", but has no idea what line number that is. As a result, it may report errors like "Expecting 3 cols but row 12 contains only 2 cols ...", where that |
Thanks @etienne-s for the great report.
Also, the line numbers on errors in large files, which @st-pasha pointed out was less than useful before, I've now fixed as well. There's now a concept of a thread being at the head position, which corresponds to the last row written to the result, so that thread knows how many rows have passed. Only the thread at the head position now stops the team and throws the error, including an accurate row number. But, it is a row number and not a line number. If there are no embedded newlines, then they are the same and the line number reported will be correct (even when multi-chunked). But, in the event of embedded newlines, the error won't ever report a line number that could accurately be used with Please test latest dev again and open a new issue if any problems. I'm pretty sure it should be all fine now in this area. Thanks again. |
I believe this is still a useful feature to have, and not very hard to implement after your work in #2627 . The @etienne-s example can better be given as
Here the warning message says "stopped early on line 3 <<c,d,e>>", however this is the 4th line in the file. Of course, this example is trivial, but in a multi-GB file the discrepancies between lines/rows may reach tens of thousands making it hard to find the error. |
Thanks @mattdowle! I've tested again and my particular problem is solved in the new version, since the new warning message includes a large part of the faulty line. (The old message included only the first 10 characters, and it was not enough to locate the problem with In some other cases it might be useful to have an exact line number reported in the message instead of a row number. |
Consider the following example:
The error message reports a problem on line 3, which is fine.
Now suppose
b
contains a line break:The problem is reported to be on line 3, which is wrong.
This could seem harmless, but when reading a very large CSV with lots of records containing line breaks, the line number in the message is totally wrong, and we end up with no clue to fix the misformatted file.
sessionInfo()
The text was updated successfully, but these errors were encountered: