-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Julia example [WIP] #29
Conversation
Thanks @simsurace!
That's correct. I can help run all the tests. We intentionally didn't build out any automated tests here because the time to build and maintain them would probably exceed the time they save. Ultimately the examples from here will find their way to a permanent place and we will add proper integration tests there. |
@simsurace I tested the Julia client against all the servers, and they all ran without error I also added some temporary code to the Julia client to write the resulting data to an Arrow IPC file, then examined the file.
|
Hi, thanks for testing! Hmm, this may be a misunderstanding on my part. I was assuming that the record batches are tables with 4096 rows, so the columns a,b,c,d would be represented as EDIT: As the clients all run without error, can you share the code you used to write the file? |
Oops, please disregard my message above about the schema. I was writing the file incorrectly. I had added The right way to do it is like this: open(Arrow.Writer, "output.arrow") do writer
for batch in stream
Arrow.write(writer, batch)
end
end When I do it that way, the nesting problem goes away. |
|
Hmm ok, I think I'm still confused about the nomenclature. Looking at the other implementations (e.g. Python), record batches seem to be small tables (i.e. 4096 rows), so wouldn't you expect the same schema/format as the full table, which has 100 million rows? |
On closer inspection: actually |
@simsurace I'm not able to get the server example working on macOS. It starts successfully, but when a client connects to it (any client), it throws an error: % julia --project=.. server.jl
Serving on localhost:8008...
[ Info: Listening on: 127.0.0.1:8008, thread id: 1
┌ Error: handle_connection handler error.
│
│ ===========================
│ HTTP Error message:
│
│ ERROR: IOError: write: invalid argument (EINVAL)
│ Stacktrace:
│ [1] uv_write(s::Sockets.TCPSocket, p::Ptr{UInt8}, n::UInt64)
│ @ Base ./stream.jl:1066
│ [2] unsafe_write(s::Sockets.TCPSocket, p::Ptr{UInt8}, n::UInt64)
│ @ Base ./stream.jl:1120
│ [3] unsafe_write
│ @ ~/.julia/packages/HTTP/PnoHb/src/Connections.jl:129 [inlined]
│ [4] unsafe_write(http::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.Connections.Connection{Sockets.TCPSocket}}, p::Ptr{UInt8}, n::UInt64)
│ @ HTTP.Streams ~/.julia/packages/HTTP/PnoHb/src/Streams.jl:95
│ [5] unsafe_write
│ @ ./io.jl:698 [inlined]
│ [6] write(s::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.Connections.Connection{Sockets.TCPSocket}}, a::Vector{UInt8})
│ @ Base ./io.jl:721
│ [7] (::HTTP.Handlers.var"#1#2"{typeof(get_stream)})(stream::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.Connections.Connection{Sockets.TCPSocket}})
│ @ HTTP.Handlers ~/.julia/packages/HTTP/PnoHb/src/Handlers.jl:61
│ [8] #invokelatest#2
│ @ ./essentials.jl:892 [inlined]
│ [9] invokelatest
│ @ ./essentials.jl:889 [inlined]
│ [10] handle_connection(f::Function, c::HTTP.Connections.Connection{Sockets.TCPSocket}, listener::HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, readtimeout::Int64, access_log::Nothing)
│ @ HTTP.Servers ~/.julia/packages/HTTP/PnoHb/src/Servers.jl:469
│ [11] (::HTTP.Servers.var"#16#17"{HTTP.Handlers.var"#1#2"{typeof(get_stream)}, HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, Set{HTTP.Connections.Connection}, Int64, Nothing, ReentrantLock, Base.Semaphore, HTTP.Connections.Connection{Sockets.TCPSocket}})()
│ @ HTTP.Servers ~/.julia/packages/HTTP/PnoHb/src/Servers.jl:401
│ request =
│ HTTP.Messages.Request:
│ """
│ GET / HTTP/1.1
│ Accept-Encoding: identity
│ Host: localhost:8008
│ User-Agent: Python-urllib/3.12
│ Connection: close
│
│ """
└ @ HTTP.Servers ~/.julia/packages/HTTP/PnoHb/src/Servers.jl:483 |
Yes, on macos there is a known issue that I think will be fixed in the next releases JuliaLang/julia#54225 |
Great, thanks. It works fine for me if I reduce |
I successfully tested this Julia server with all the other client examples 🎉 Just one small thing: The int64 columns created in the server example are non-nullable (they have no validity bitmap). All the other server examples create nullable int64 columns (with a validity bitmap). Is it possible to make them nullable here for consistency? |
Sure! I will do that. EDIT: done in 5f0a1a0 |
I would like to ask for feedback on the Julia community channels for how to make this more performant, but we can also do this in a follow-up PR. EDIT: opened a thread on Discourse. |
I think this is ready to merge as a functional first version. I would propose to put any possible performance enhancement in a new PR. |
Thank you @simsurace! |
This is a basic Julia example. I will update below when I complete tests locally. There is no automated testing in this repo it seems.
Julia client tested with
Julia server tested with
Closes apache/arrow-julia#502