Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean and fast readtable implementation #287

Merged
merged 5 commits into from
Jun 20, 2013
Merged

Clean and fast readtable implementation #287

merged 5 commits into from
Jun 20, 2013

Conversation

johnmyleswhite
Copy link
Contributor

This is an implementation of readtable that is much faster than our previous version and uses keyword arguments to provide a lot of additional functionality. Some of the functionality is still not in place, but I think this is close to the correct outline of all the features we should add to our readtable function.

@johnmyleswhite
Copy link
Contributor Author

@StefanKarpinski and @ViralBShah, this is failing on Travis because :jl_substrtod can't be found. This doesn't cause any problems on my machine. Any thoughts?

@tshort and @HarlanH, we need to replace | with |> to follow Julia's new standards. Are you fine with that?

@kmsquire
Copy link
Contributor

It would be nice if julia started tagging subversions during development, so that the next version of DataFrames could depend on the version of Julia where |> was introduced. As of now, a new version of DataFrames will be released which will be incompatible with Julia from a week ago. Anyone on the development branch but slightly behind and who updates DataFrames without updating Julia first will be surprised when DataFrames stops working. :-(

For such reasons, it might actually be good to wait before changing | to |>.

I'm going to file this as a julia issue...

@johnmyleswhite
Copy link
Contributor Author

Just to let people know, I did forget one important thing: this implementation doesn't parse column names automatically.

@johnmyleswhite
Copy link
Contributor Author

Ugh. There are other bugs in this implementation since I blew out the entire old io.jl file. Let me fix that now.

@ViralBShah
Copy link
Contributor

@staticfloat Are the nightlies behind?

@staticfloat
Copy link
Contributor

Looks like they were last built on the 18th, so not significantly, no. The bundled libuv was recently updated as well.

@ViralBShah
Copy link
Contributor

Ok, I believe :jl_substrtod is more recent than that. Perhaps tomorrow's binaries should fix this.

@HarlanH
Copy link
Contributor

HarlanH commented Jun 20, 2013

What's the |> change? I can't find a core Julia issue that seems to refer to that, and I may have missed any announcement on the mailing list.

@johnmyleswhite
Copy link
Contributor Author

The pipe operator is now |> since we didn't want to allow | to have different semantics depending on type.

Change some keyword argument names
Allow user to select whether to ignore whitespace padding
Allow user to decide whether to convert strings to factors
Read in column names correctly
Allow user to skip lines at the start of a file
Restore printtable and writetable
Fix test that broke with Julia changes
@johnmyleswhite
Copy link
Contributor Author

This is now finished. It has all of the old functionality plus more (like keyword arguments, including one for automatic string to factor conversion if the user requests it). It's also super fast. @ViralBShah, it reads your really corrupt data set in about 7 seconds into a standard DataFrame with type inference.

Barring objections, I'll merge it as soon as the nightlies catch up with the :jl_substrtod function. We finally have a usable IO system now.

Deal with missing EOL's at the end of data files
Encode completely blank string fields as NA's
Provide more diagnostic information when data can't be read
@johnmyleswhite
Copy link
Contributor Author

Merging since master is currently broken thanks to the removal of memio.

johnmyleswhite added a commit that referenced this pull request Jun 20, 2013
Clean and fast `readtable` implementation
@johnmyleswhite johnmyleswhite merged commit 664f792 into master Jun 20, 2013
@johnmyleswhite johnmyleswhite deleted the readtable branch June 20, 2013 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants