Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert addition of weakrefstrings option #140

Closed
wants to merge 1 commit into from
Closed

Conversation

nalimilan
Copy link
Member

It broke tests, and has weird semantics since it completely overrides
explicit request for WeakRefString columns via the types argument.
This partly reverts commit ea2f215.

It broke tests, and has weird semantics since it completely overrides
explicit request for WeakRefString columns via the types argument.
This partly reverts commit ea2f215.
@alyst
Copy link
Contributor

alyst commented Dec 24, 2017

+1 for reverting it.

If this is the right context to discuss the issue: I think WeakRefString is a proper optimization, but we should keep this type as internal as possible:

  • stop exporting it from WeakRefStrings.jl (CSV.jl should explicitly import it)
  • do not mention it in the user docs and do not add weakrefstrings option to CSV. To my understanding, with Stop lying about WeakRefStringArray eltype WeakRefStrings.jl#17 (and WeakRefStringArray: make eltype respect Missing WeakRefStrings.jl#20) there's no easy way to leak WeakRefString object outside of WeakRefStringArray, unless indexing .elements directly. If I'm wrong, then we should do the transform to String unconditionally.
  • I don't think WeakRefString is a type that should be allowed to be specified in types=: a) the returned eltype for this column would be String; b) anyway, if it's not exported and not advertized in the docs, it's very unlikely that somebody will use it for types=

There's also DataStreams.weakrefstrings() method that IIUC already defaults to false for generic sinks and to true for CSV.Sink.
In theory, it should already facilitate WeakRefString => String transform for the appropriate sinks.

@quinnj
Copy link
Member

quinnj commented Dec 24, 2017

@alyst, yes, I had some offline travel time today and basically came to the same conclusions/plan. Basically, always use WeakRefStrings for parsing/internally, but to the user, they should essentially never see an actual WeakRefString.

@nalimilan
Copy link
Member Author

Makes sense. Though it could still be useful to have an argument to allow the user to choose whether to get a WeakRefStringArray or an Array{String}? What are the implications of keeping WeakRefStringArray columns in a data frame? Do they force keeping the whole CSV file in memory? Can they be serialized?

@quinnj
Copy link
Member

quinnj commented Jan 8, 2018

Leaving the option for now; WeakRefStrings always return Strings now, so WeakRefString usage should be almost entirely an internal implementation detail for users. Leaving the keyword arg as an escape hatch in case users really need to avoid having WeakRefStrings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants