Describe the enhancement requested
As noted by @wgtmac in #48717 (comment):
Since it allows ptr to be nullptr for empty string, following Copy may be a UB if min is an empty string with nullptr.
template <>
inline void TypedStatisticsImpl<ByteArrayType>::Copy(const ByteArray& src, ByteArray* dst,
ResizableBuffer* buffer) {
if (dst->ptr == src.ptr) return;
PARQUET_THROW_NOT_OK(buffer->Resize(src.len, false));
std::memcpy(buffer->mutable_data(), src.ptr, src.len);
*dst = ByteArray(src.len, buffer->data());
}
When ByteArray has len=0 and ptr=nullptr (the default-constructed state per types.h:649), calling std::memcpy with a nullptr source is undefined behavior according to the C++ standard, even when the size is 0.
This issue also exists in other places:
DictDecoderImpl<ByteArrayType>::SetDict in decoder.cc:1061
- Test utilities in
statistics_test.cc and column_writer_test.cc
Component(s)
C++, Parquet