Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][FlightRPC] Memory alignment related warning #37195

Open
cwang9208 opened this issue Aug 16, 2023 · 11 comments
Open

[C++][FlightRPC] Memory alignment related warning #37195

cwang9208 opened this issue Aug 16, 2023 · 11 comments

Comments

@cwang9208
Copy link

Describe the usage question you have. Please include as many useful details as possible.

It seems the RecordBatch returned from doGet in Flight RPC is not memory aligned.

I tested the simple program below:

import duckdb
import pyarrow as pa
import pyarrow.flight as flight

# connect to an in-memory database
con = duckdb.connect()

client = pa.flight.connect(('localhost', 8815))

reader = client.do_get(flight.Ticket(b'ints')).to_reader()

results = con.execute("SELECT * FROM reader").arrow()

print(results)

The code finishes, tut the following warning:

C:/arrow/cpp/src/arrow/acero/source_node.cc:76: An input buffer was poorly aligned.  This could lead to crashes or poor performance on some hardware.  Please ensure that all Acero sources generate aligned buffers, or change the unaligned buffer handling configuration to silence this warning.

I'm wondering is there any option I can turn on to make the record batch returned by flight rpc aligned?

Thanks.

Component(s)

C++, FlightRPC

@mapleFU
Copy link
Member

mapleFU commented Aug 16, 2023

I'm not so familiar with FlightRPC. The warning message is introduced in #35565 .

And seems that it unalign some arrow-ipc buffer here.

@kou
Copy link
Member

kou commented Aug 16, 2023

@cwang9208
Copy link
Author

@kou Thanks for your reply. I'm thinking about an alternative: is there any (C++) API to copy a recordbatch to a new memory-aligned recordbatch?

@cwang9208
Copy link
Author

@mapleFU Thanks for your reply. I'm thinking about an alternative: I'm wondering is there any (C++) API to copy a recordbatch to a new memory-aligned recordbatch?

@kou
Copy link
Member

kou commented Aug 17, 2023

It seems that your Flight RPC server produces non-aligned record batches.
Is it implemented in C++ API?

@cwang9208
Copy link
Author

@kou The server is a python-based flight rpc server. But I guess that should not matter because the client side just reuse the gRPC-allocated memory for the returned record batch.

@kou
Copy link
Member

kou commented Aug 17, 2023

Hmm. Does this mean that the official gRPC client library aligns but Apache Arrow C++'s Flight RPC client breaks it?
Can you confirm whether your Flight RPC server returns aligned record batches or not?

@cwang9208
Copy link
Author

@kou the official gRPC does not guarantee memory alignment.

@kou
Copy link
Member

kou commented Aug 17, 2023

Then, what do you want to explain?

@kou kou changed the title memory alignment in Flight rpc [C++][FlightRPC] Memory alignment related warning Aug 17, 2023
@agoncharuk
Copy link

agoncharuk commented Apr 5, 2024

Hello folks, I am observing a similar issue when communicating with a Java-based FlightSQL server with a C++ flight SQL client.
The C++ part is compiled with sanitizers which fails on an attempt to load an element offset from the binary array:

/usr/include/arrow/array/array_binary.h:74:23: runtime error: load of misaligned address 0x625000025f51 for type 'const offset_type', which requires 4 byte alignment

From the replies above I understand that the Java side is not required to produce a properly aligned response buffer. However, given that the unaligned cast is a UB, shouldn't the client side make proper alignments before passing the data to the user?
BTW, setting ACERO_ALIGNMENT_HANDLING=reallocate does not fix the issue.

@janosik47
Copy link
Contributor

The problem is even worse when one attempt to 'consume' the record batches produces by a Flight server in the datafusion lib. The rust is stricter and just panics ....
I my case I am using Java-based Flight server, java seems to attempt to align all arrays into 8 bytes, which is not good enough :(

thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-50.0.0/src/buffer/scalar.rs:138:17:
Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type. Before importing buffer through FFI, please make sure the allocation is aligned.
stack backtrace:
...
  15:     0x7f64fe835924 - core::panicking::panic_fmt::ha328cac7a9cfe5f7
                               at /rustc/5119208fd78a77547c705d1695428c88d6791263/library/core/src/panicking.rs:72:14
  16:     0x7f64fec6ad57 - arrow_buffer::buffer::scalar::ScalarBuffer<T>::new::h0250dc699e8f33a4
  17:     0x7f64ffe75d32 - arrow_array::array::get_offsets::h09013149e2c97dc4
  18:     0x7f64fec5f0c0 - arrow_array::array::make_array::h8a4502f718fae730
  19:     0x7f64fec63e13 - <arrow_array::array::struct_array::StructArray as core::convert::From<arrow_data::data::ArrayData>>::from::hdb29f2e18b297be5
  20:     0x7f64feb28ac3 - <arrow_array::record_batch::RecordBatch as arrow::pyarrow::FromPyArrow>::from_pyarrow::h3dcb8844281b2862
  21:     0x7f64fe8c4a27 - <datafusion_physical_plan::stream::RecordBatchStreamAdapter<S> as futures_core::stream::Stream>::poll_next::hd4a58bbb60ee7626
  22:     0x7f64fff9cb00 - datafusion_physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}::h407088640b03822d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants