Commit 4378f82
BUG: Corrects stopping logic when nrows argument is supplied (#7626)
closes #7626
Subsets of tabular files with different "shapes"
will now load when a valid skiprows/nrows is given as an argument -
Conditions
for error: 1) There are different "shapes" within a tabular data
file, i.e. different numbers of columns. 2) A "narrower" set of
columns is followed by a "wider" (more columns) one, and the narrower
set is laid out such that the end of a 262144-byte block occurs within
it. Issue summary: The C engine for parsing files reads in 262144
bytes at a time. Previously, the "start_lines" variable in
tokenizer.c/tokenize_bytes() was set incorrectly to the first line in
that chunk, rather than the overall first row requested. This lead to
incorrect logic on when to stop reading when nrows is supplied by the
user. This always happened but only caused a crash when a wider set of
columns followed in the file. In other cases, extra rows were read in
but then harmlessly discarded. This pull request always uses the
first requested row for comparisons, so only nrows will be parsed
when supplied.
Author: Jeff Carey <jeff.carey@gmail.com>
Closes #14747 from jeffcarey/fix/7626 and squashes the following commits:
cac1bac [Jeff Carey] Removed duplicative test
6f1965a [Jeff Carey] BUG: Corrects stopping logic when nrows argument is supplied (Fixes #7626)1 parent 53bf1b2 commit 4378f82
File tree
3 files changed
+21
-5
lines changed- doc/source/whatsnew
- pandas
- io/tests/parser
- src/parser
3 files changed
+21
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
726 | 726 | | |
727 | 727 | | |
728 | 728 | | |
729 | | - | |
| 729 | + | |
730 | 730 | | |
731 | | - | |
| 731 | + | |
732 | 732 | | |
733 | 733 | | |
734 | 734 | | |
735 | 735 | | |
736 | 736 | | |
737 | | - | |
738 | | - | |
739 | 737 | | |
740 | 738 | | |
741 | 739 | | |
| |||
1384 | 1382 | | |
1385 | 1383 | | |
1386 | 1384 | | |
1387 | | - | |
| 1385 | + | |
1388 | 1386 | | |
1389 | 1387 | | |
1390 | 1388 | | |
| |||
0 commit comments