-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.6/0.7 compat & various minor fixes #143
Conversation
…ing sure CSV.write respects writing Date & DateTime differently
src/Source.jl
Outdated
@@ -86,7 +84,7 @@ function Source(;fullpath::Union{AbstractString,IO}="", | |||
datarow = datarow == -1 ? (isa(header, Vector) ? 0 : last(header)) + 1 : datarow # by default, data starts on line after header | |||
rows = fs == 0 ? -1 : max(-1, rows - datarow + 1 - footerskip) # rows now equals the actual number of rows in the dataset | |||
|
|||
# figure out # of columns and header, either an Integer, Range, or Vector{String} | |||
# figure out # of columns and header, either an Integer, AbstractRange, or Vector{String} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra ws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
src/Source.jl
Outdated
@@ -239,7 +239,7 @@ Keyword Arguments: | |||
* `decimal::Union{Char,UInt8}`: character to recognize as the decimal point in a float number, e.g. `3.14` or `3,14`; default `'.'` | |||
* `truestring`: string to represent `true::Bool` values in a csv file; default `"true"`. Note that `truestring` and `falsestring` cannot start with the same character. | |||
* `falsestring`: string to represent `false::Bool` values in a csv file; default `"false"` | |||
* `header`: column names can be provided manually as a complete Vector{String}, or as an Int/Range which indicates the row/rows that contain the column names | |||
* `header`: column names can be provided manually as a complete Vector{String}, or as an Int/ AbstractRange which indicates the row/rows that contain the column names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
src/TransposedSource.jl
Outdated
@@ -233,6 +234,9 @@ function TransposedSource(;fullpath::Union{AbstractString,IO}="", | |||
columntypes[c] = typ | |||
end | |||
end | |||
if !weakrefstrings | |||
columntypes = [T <: WeakRefString ? String : T for T in columntypes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since T
could be e.g. Union{Missing, WeakRefString{UInt8}}
(and WeakRefStringArray
now supports that) the code here should do the similar thing as a little bit above when converting to categorical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same thing should be done for Source
.
There's also #138 to consolidate column type detection code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
src/Source.jl
Outdated
Note by default, "string" or text columns will be parsed as the [`WeakRefString`](https://github.com/quinnj/WeakRefStrings.jl) type. This is a custom type that only stores a pointer to the actual byte data + the number of bytes. | ||
To convert a `String` to a standard Julia string type, just call `string(::WeakRefString)`, this also works on an entire column. | ||
Oftentimes, however, it can be convenient to work with `WeakRefStrings` depending on the ultimate use, such as transfering the data directly to another system and avoiding all the intermediate copying. | ||
* `weakrefstrings::Bool=true`: whether WeakRefStrings should be used internally to speed up file parsing; can only be `=true` for Sinks that support WeakRefStringArrays; note that regular Strings are returned from WeakRefStringArray; WeakRefStrings are only used internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"WeakRefStringArrays" is not a proper type name, but maybe for clarity it's possible to adjust the sentence a bit, so that it uses "WeakRefString
", "String
" etc, e.g
whether to use [`WeakRefStrings`](https://github.com/quinnj/WeakRefStrings.jl) package to speed up file parsing; can only be `=true` for the `Sink` objects that support `WeakRefStringArray` columns. Note that `WeakRefStringArray` still returns regular `String` elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
Codecov Report
@@ Coverage Diff @@
## master #143 +/- ##
=========================================
Coverage ? 86.76%
=========================================
Files ? 8
Lines ? 869
Branches ? 0
=========================================
Hits ? 754
Misses ? 115
Partials ? 0
Continue to review full report at Codecov.
|
@@ -38,6 +38,9 @@ end | |||
return byte | |||
end | |||
|
|||
substitute(::Type{Union{T, Missing}}, ::Type{T1}) where {T, T1} = Union{T1, Missing} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's the kind of function that I wanted to have in Missings
/0.7 Base, because it's implicitly used in several places at JuliaData (cc @nalimilan).
Maybe it should be called substitute_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Base the convention seems to be type*
, e.g. typeintersect
and typejoin
(we have discussed adding typesubtract
). So why not typesubstitute
(but of course it's much less general than the others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true, for the Base it should be more specific. substitute_nonmissing_type
? nonmissingtype_substitute
? Don't have a good variant that starts with type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we'll be able to include all of these helper functions in Base, so for now it could be just Missings.typesubstitute
or something like that.
@@ -38,6 +38,9 @@ end | |||
return byte | |||
end | |||
|
|||
substitute(::Type{Union{T, Missing}}, ::Type{T1}) where {T, T1} = Union{T1, Missing} | |||
substitute(::Type{T}, ::Type{T1}) where {T, T1} = T1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there also should be
substitute(::Type{Missing}, ::Type{T1}) where {T1} = Missing
rule.
src/Source.jl
Outdated
@@ -179,7 +175,7 @@ function Source(;fullpath::Union{AbstractString,IO}="", | |||
end | |||
end | |||
if !weakrefstrings | |||
columntypes = [T <: WeakRefString ? String : T for T in columntypes] | |||
columntypes = [Missings.T(T) <: WeakRefString ? substitute(T, String) : T for T in columntypes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's where substitute(Missing, T)
rule is required, because Missings.T(Missing) == Union{} <: WeakRefString
.
src/Source.jl
Outdated
if T >: Missing | ||
columntypes[i] = Union{columntypes[i], Missing} | ||
end | ||
if length(levels[i]) / sum(values(levels[i])) < .67 && Missings.T(T) <: WeakRefString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T !== Missing
check was here to exclude all-missing columns from the conversion to categorical.
No description provided.