-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about KDF performance for reading from CSV file #970
Comments
@Jolanrensen I remember you tested in one of your PRs the new CSV reader implementation, could you please share some details? |
Indeed! While the focus of DataFrame is on type-safety and ease of use, we do consider performance. DF 0.15 will have a new experimental module "dataframe-csv" which is built on the much faster Deephaven-csv library. The old implementation uses Apache Commons CSV (which had a habit of running out of memory for large CSV files). I made a little benchmark to test the difference between the two implementations. They consist of a small (384 B), medium (19,8 MB), and large (784,5 MB) csv file: #903 (comment). Note that DataFrame reads all data into the JVM heap memory. This is to achieve the aforementioned ease-of-use and type safety, so if your CSV is too large, you can still run into limits, but increasing your memory size could help here. Hope that answers your question :) |
interesting, I have also seen on the medium site that someone tested the speed of reading csv files with deephaven and indeed deephaven excels because of its speed, when will DF 0.15 be released?because i will use it to read tabular data correction term file in my project like vsop2000 coefficient correction term which for the sun and moon data alone there are a total of 86 thousand rows of correction terms |
We won't achieve the full speed of Deephaven (as written on medium), since we work with boxed lists in memory, but at least it will be a lot faster than before :). We plan to have a release candidate for 0.15 out this week. If nothing comes up, the full release will be soon thereafter. If you cannot wait and want to try it already, we always publish dev versions from our master branch: https://central.sonatype.com/artifact/org.jetbrains.kotlinx/dataframe/0.15.0-dev-5148. Make sure to also add the new experimental |
okay thank you, I'll try it |
before i use dataframe i would like to know if anyone has tested how fast dataframe is when reading and managing data from csv files?
please comment
The text was updated successfully, but these errors were encountered: