Skip to content

[C++][Parquet] Fix undefined behavior in memcpy with nullptr for empty ByteArray #48744

@rynewang

Description

@rynewang

Describe the enhancement requested

As noted by @wgtmac in #48717 (comment):

Since it allows ptr to be nullptr for empty string, following Copy may be a UB if min is an empty string with nullptr.

template <>
inline void TypedStatisticsImpl<ByteArrayType>::Copy(const ByteArray& src, ByteArray* dst,
                                                     ResizableBuffer* buffer) {
  if (dst->ptr == src.ptr) return;
  PARQUET_THROW_NOT_OK(buffer->Resize(src.len, false));
  std::memcpy(buffer->mutable_data(), src.ptr, src.len);
  *dst = ByteArray(src.len, buffer->data());
}

When ByteArray has len=0 and ptr=nullptr (the default-constructed state per types.h:649), calling std::memcpy with a nullptr source is undefined behavior according to the C++ standard, even when the size is 0.

This issue also exists in other places:

  • DictDecoderImpl<ByteArrayType>::SetDict in decoder.cc:1061
  • Test utilities in statistics_test.cc and column_writer_test.cc

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions