-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - C Data Interface #179
Conversation
83fd834
to
feb5fff
Compare
using Arrow, PyCall
pd = pyimport("pandas")
pa = pyimport("pyarrow")
df = pd.DataFrame(py"""{'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']}"""o)
rb = pa.record_batch(df)
sch = Arrow.CDataInterface.get_schema() do ptr
rb.schema._export_to_c(Int(ptr))
end
arr = Arrow.CDataInterface.get_array() do ptr
rb._export_to_c(Int(ptr))
end |
missed a spot little refactor pycall and conda to extras little refactoring squash
feb5fff
to
130de79
Compare
precision = Int(splits[1]) | ||
scale = Int(splits[2]) | ||
if length(splits) == 3 | ||
bandwidth = splits[3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be bitwidth
instead of bandwidth
if length(splits) == 3 | ||
bandwidth = splits[3] | ||
end | ||
#TODO return something here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the eltypes.jl
file, we define:
struct Decimal{P, S, T}
value::T # only Int128 or Int256
end
which is what we should return here.
end | ||
#TODO return something here | ||
elseif format_string[1] == 'w' | ||
#TODO figure out fixed width binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just be the same as Arrow.FixedSizeList
, but with UInt8
as the element type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a fixed width binary type won't have any children; it's like a hard-coded UInt8
child type and so doesn't need to be parsed recursively.
#TODO figure out fixed width binary | ||
elseif format_string[1] == '+' | ||
if format_string[2] == 'l' || format_string[2] == 'L' | ||
Arrow.List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these nested types, we'll need to parse the children recursively. So the overall method signature needs to take the full ArrowSchema
type, and then access the format string at the top. Then when we get here, we'll call get_type_from_format_string(sch.children[1])
and so on to get the List element type so we end up with a type like Arrow.List{Vector{Int64}}
or whatever.
end | ||
elseif format_string[1] == 't' | ||
if format_string[2:3] | ||
Date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing we should probably support is a convert::Bool=true
keyword arg to this function. It would allow the user to specify whether they'd like the native arrow type converted to a more natural Julia type or not. We support this in Arrow.Table
. It's nice because I think there are some cases where the user just wants the raw arrow data type, but usually the user wants the nice Julia type. This is one of those cases where we have Arrow.Date
, which is different from Dates.Date
.
* fix propagation of maxdepth kwarg * bump Project.toml
* ability to append partitions to an arrow file This adds a method to `append` partitions to existing arrow files. Partitiions to append to are supplied in the form of any [Tables.jl](https://github.com/JuliaData/Tables.jl)-compatible table. Multiple record batches will be written based on the number of `Tables.partitions(tbl)` that are provided. Each partition being appended must have the same `Tables.Schema` as the destination arrow file that is being appended to. Other parameters that `append` accepts are similar to what `write` accepts. * remove unused methods * add more tests and some fixes * allow appends to both seekable IO and files * few changes to Stream,avoid duplication for append store few additional stream properties in the `Stream` data type and avoid duplicating code for append functionality * call Tables.schema on result of Tables.columns
* Ensure requested List type is requested on List getindex Fixes apache#167. Not tested yet. * add test
…pache#183) * Add global metadata lock to ensure thread safety of global metadata store Follow up to apache#90, based on discussions in that issue. * fix
Arrow.Flatbuf.TimeUnitModule.NANOSECOND | ||
end | ||
|
||
timezone = length(format_string) == 4 ? nothing : format_string[5:end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @quinnj , this line of code is incorrect. I couldn't quite figure out how to new up the timezone type. Is there any documentation around this?
And in general, how does one new-up an ArrowVector? I was hoping to find a constructor that looks vaguely like this
|
…/c_data_interface
…a-/c_data_interface
Building off of this PR
#178