Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance and stability of writing to CSV #866

Merged
merged 5 commits into from
Feb 26, 2022

Conversation

ritchie46
Copy link
Collaborator

@ritchie46 ritchie46 commented Feb 25, 2022

This addresses #865. It is a breaking change, but it gives us more control and very likely improves performance because we only use csv-core for utf8 fields.

Because we have a lot less variants than the original csv crate has to deal with. E.g.

  • we know the schema,
  • we know that we will have an equal number of rows and fields

@codecov
Copy link

codecov bot commented Feb 25, 2022

Codecov Report

Merging #866 (bbd900d) into main (10e6cd5) will decrease coverage by 0.03%.
The diff coverage is 60.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #866      +/-   ##
==========================================
- Coverage   71.56%   71.52%   -0.04%     
==========================================
  Files         335      335              
  Lines       17952    17989      +37     
==========================================
+ Hits        12847    12867      +20     
- Misses       5105     5122      +17     
Impacted Files Coverage Δ
src/io/csv/write/mod.rs 57.77% <50.00%> (-7.61%) ⬇️
src/io/csv/write/serialize.rs 60.16% <76.19%> (-0.61%) ⬇️
src/compute/arithmetics/time.rs 25.68% <0.00%> (-0.92%) ⬇️
src/io/json/write/serialize.rs 86.11% <0.00%> (-0.22%) ⬇️
src/io/json/write/mod.rs 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10e6cd5...bbd900d. Read the comment docs.

@ritchie46
Copy link
Collaborator Author

ritchie46 commented Feb 25, 2022

Still have to do the parallel example.

@ritchie46 ritchie46 force-pushed the csv_write branch 2 times, most recently from 5cb5da2 to 2805aea Compare February 25, 2022 11:42
Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR!

Indeed this will be more performant. I left one comment.

src/io/csv/write/serialize.rs Outdated Show resolved Hide resolved
src/io/csv/write/mod.rs Outdated Show resolved Hide resolved
@ritchie46
Copy link
Collaborator Author

It should be good to go.

@jorgecarleitao jorgecarleitao merged commit a26f95b into jorgecarleitao:main Feb 26, 2022
@jorgecarleitao jorgecarleitao changed the title change csv-writer Improved performance and stability of writing to CSV Feb 26, 2022
@jorgecarleitao
Copy link
Owner

Indeed. Thanks a lot, @ritchie46 !

sydduckworth pushed a commit to mindx/arrow2 that referenced this pull request Mar 2, 2022
sydduckworth pushed a commit to mindx/arrow2 that referenced this pull request Mar 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants