-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated last line in CSV.foreach #279
Comments
@kou |
Thanks for your report. Lines 299 to 308 in 22e62bc
@abcdefg-1234567 Great! Do you want to work on fixing this problem? |
@kou |
@kou
However, I do not know how to make the following state by code.
I have confirmed that the bug is reproduced when the following conditions are set manually using the IDE. |
hey @abcdefg-1234567 you can use the following: File.open('test.csv', 'w') do |f|
2499.times do
f.print("AAAA1234567890\r\n")
end
f.print("AAAA1234567890")
end |
@GabrielNagy |
OK. Let's reduce the reproducible CSV size as much as possible as the next step for easy to debug. |
I have confirmed that the bug will not reproduce if the csv is less than 2048 rows. |
I have confirmed the following. The result of "value = parse_column_value (line 1030 of parser.rb)" when @ lineno=2048 is "AAAA1234567890AAAAA1234567890". I am also wondering if changes are needed around the adjust_last_keep method. |
Sure.
I hope that this explanation helps you. |
Thank you for your detailed explanation! |
Including line number in line contents will helpful: File.open('/tmp/test.csv', 'w') do |f|
lines = 2500.times.collect do |i|
"A%013d" % i
end
f.print(lines.join("\r\n"))
end Output with the test file:
It seems that the last line was used twice. |
I cloud reproduce this with the script: ENV["CSV_PARSER_SCANNER_TEST"] = "yes"
require "csv"
csv = CSV.new("a\r\nb", row_sep: "\r\n", strip: true, skip_lines: /\A *\z/)
csv.each do |row|
pp row
end
|
GitHub: fix ruby/csv#279 It's happen when: * `keep_start`/`keep_{drop,back}` are nested. (e.g.: `strip: true, skip_lines: /.../`) * Row separator is `\r\n`. * `InputScanner` is used. (Small input doesn't use `InputScanner`) Reported by Gabriel Nagy. Thanks!!! ruby/csv@183635ab56
Thanks for fixing this! |
I found a bug that only reproduces with a very specific set of prerequisites (all of the following must be true):
CSV.foreach
strip: true
skip_lines: /\A,+\n?\z/
The following example (where
original.csv
is a file containing the lineAAAA1234567890
~2500 times):will print the last line duplicated:
As a workaround I used
CSV.parse(File.read, ...)
with the same options, but I still wanted to flag this issue.The text was updated successfully, but these errors were encountered: