Commit 6dd3776
authored
[data] Fix reading from zipped json (#58214)
## Description
### Status Quo
This PR #54667 addressed issues
of OOM by sampling a few lines of the file. However, this code always
assumes the input file is seekable(ie, not compressed). This means
zipped files are broken like this issue:
#55356
### Potential Workaround
- Refractor reused code between JsonDatasource and FileDatasource
- default to 10000 if zipped file found
## Related issues
#55356
## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.
---------
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>1 parent 92d8471 commit 6dd3776
File tree
4 files changed
+98
-46
lines changed- python/ray/data
- _internal
- datasource
- datasource
- tests
4 files changed
+98
-46
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
174 | 177 | | |
175 | 178 | | |
176 | 179 | | |
| |||
200 | 203 | | |
201 | 204 | | |
202 | 205 | | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
203 | 209 | | |
204 | 210 | | |
205 | 211 | | |
| |||
230 | 236 | | |
231 | 237 | | |
232 | 238 | | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
237 | 247 | | |
238 | 248 | | |
239 | 249 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| |||
1712 | 1713 | | |
1713 | 1714 | | |
1714 | 1715 | | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
| 1730 | + | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
321 | 322 | | |
322 | 323 | | |
323 | 324 | | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
324 | 369 | | |
325 | 370 | | |
326 | 371 | | |
| |||
336 | 381 | | |
337 | 382 | | |
338 | 383 | | |
339 | | - | |
340 | | - | |
341 | 384 | | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
| 385 | + | |
| 386 | + | |
362 | 387 | | |
363 | 388 | | |
364 | 389 | | |
365 | 390 | | |
366 | 391 | | |
367 | 392 | | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
384 | 397 | | |
385 | | - | |
| 398 | + | |
| 399 | + | |
386 | 400 | | |
387 | 401 | | |
388 | 402 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
528 | 528 | | |
529 | 529 | | |
530 | 530 | | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
531 | 535 | | |
532 | | - | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
533 | 542 | | |
534 | 543 | | |
535 | 544 | | |
536 | | - | |
537 | | - | |
| 545 | + | |
| 546 | + | |
538 | 547 | | |
539 | 548 | | |
540 | 549 | | |
| |||
0 commit comments