Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet IntDecoder fail #2821

Closed
jinchengchenghh opened this issue Oct 12, 2022 · 4 comments
Closed

Parquet IntDecoder fail #2821

jinchengchenghh opened this issue Oct 12, 2022 · 4 comments
Assignees
Labels
bug Something isn't working io

Comments

@jinchengchenghh
Copy link
Contributor

jinchengchenghh commented Oct 12, 2022

When I test read TPCDS parquet table, I receive coredump because bitWidth is 0
sql:

select s_store_sk from store where s_state = 'TN';

substrait plan:

{"extensions":[{"extensionFunction":{"name":"is_not_null:str"}},{"extensionFunction":{"functionAnchor":1,"name":"equal:str_str"}},{"extensionFunction":{"functionAnchor":2,"name":"and:bool_bool"}}],"relations":[{"root":{"input":{"project":{"common":{"direct":{}},"input":{"read":{"common":{"direct":{}},"baseSchema":{"names":["s_store_sk","s_state"],"struct":{"types":[{"i64":{"nullability":"NULLABILITY_NULLABLE"}},{"string":{"nullability":"NULLABILITY_NULLABLE"}}]},"partitionColumns":{"columnType":["NORMAL_COL","NORMAL_COL"]}},"filter":{"scalarFunction":{"functionReference":2,"outputType":{"bool":{"nullability":"NULLABILITY_NULLABLE"}},"arguments":[{"value":{"scalarFunction":{"outputType":{"bool":{"nullability":"NULLABILITY_REQUIRED"}},"arguments":[{"value":{"selection":{"directReference":{"structField":{"field":1}}}}}]}}},{"value":{"scalarFunction":{"functionReference":1,"outputType":{"bool":{"nullability":"NULLABILITY_NULLABLE"}},"arguments":[{"value":{"selection":{"directReference":{"structField":{"field":1}}}}},{"value":{"literal":{"string":"TN"}}}]}}}]}},"localFiles":{"items":[{"uriFile":"file:///tmp/tpcds-generated/store/part-00000-4de8c37f-0e0f-4016-bd78-421a3af40edd-c000.snappy.parquet","length":"9171","parquet":{}}]}}},"expressions":[{"selection":{"directReference":{"structField":{}}}}]}},"names":["s_store_sk#3631"]}}]}

Plan Node

Plan Node:
-- Project[expressions: (n1_0:BIGINT, ROW["n0_0"])] -> n1_0:BIGINT
  -- TableScan[table: hive_table, range filters: [(s_state, BytesRange: [TN, TN] no nulls)]] -> n0_0:BIGINT, n0_1:VARCHAR
#7  0x00007fd9aed703e7 in facebook::velox::dwio::common::IntDecoder<false>::decodeBitsLE<int> (bits=0x7fdbe801a178, bitOffset=0,
    rows=..., rowBias=0, bitWidth=0 '\000',
    bufferEnd=0x7fdbe801a178 '\241' <repeats 24 times>, "\255\336\335ں\335ںP\246\001\350\333\177", result=0x7fdbe802a2c0)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/common/IntDecoder.cpp:2602
#8  0x00007fd9ad7b56df in facebook::velox::parquet::RleDecoder<false>::processRun<true, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802a440, rows=0x7fdbe802af00, rowIndex=0, currentRow=0, numRows=2, scatterRows=0x7fdbe802a700,
    filterHits=0x7fdbe802a1c0, values=0x7fdbe802a2c0, numValues=@0x7fdc8aef1bac: 0, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/RleDecoder.h:259
#9  0x00007fd9ad7a873e in facebook::velox::parquet::RleDecoder<false>::bulkScan<true, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802a440, nonNullRows=..., scatterRows=0x7fdbe802a700, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/RleDecoder.h:338
#10 0x00007fd9ad79fca3 in facebook::velox::parquet::RleDecoder<false>::fastPath<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802a440, nulls=0x7fdbe802a5c0, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/RleDecoder.h:209
#11 0x00007fd9ad77b69f in facebook::velox::parquet::RleDecoder<false>::readWithVisitor<true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802a440, nulls=0x7fdbe802a5c0, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/RleDecoder.h:84
#12 0x00007fd9ad77409c in facebook::velox::parquet::PageReader::callDecoder<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true>, 0> (this=0x7fdbe802ac70, nulls=0x7fdbe802a5c0, nullsFromFastPath=@0x7fdc8aef1f48: true, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/PageReader.h:196
#13 0x00007fd9ad76bf1f in facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802ac70, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/PageReader.h:325
#14 0x00007fd9ad76a47e in facebook::velox::parquet::ParquetData::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe8029bd0, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/ParquetData.h:118
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007fd9ad768861 in facebook::velox::parquet::StringColumnReader::readHelper<facebook::velox::common::BytesRange, true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7fdbe8029a30, filter=0x7fdbe8014c50, rows=...,
    extractValues=...) at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/parquet/reader/StringColumnReader.cpp:38
#16 0x00007fd9ad767fed in facebook::velox::parquet::StringColumnReader::processFilter<true, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader> > (this=0x7fdbe8029a30, filter=0x7fdbe8014c50, rows=..., extractValues=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/parquet/reader/StringColumnReader.cpp:70
#17 0x00007fd9ad767175 in facebook::velox::parquet::StringColumnReader::read (this=0x7fdbe8029a30, offset=0, rows=..., incomingNulls=0x0)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/parquet/reader/StringColumnReader.cpp:111
#18 0x00007fd9aeda54b9 in facebook::velox::dwio::common::SelectiveStructColumnReader::read (this=0x7fdbe80147c0, offset=0, rows=...,
    incomingNulls=0x0) at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/common/SelectiveStructColumnReader.cpp:139
#19 0x00007fd9aeda5111 in facebook::velox::dwio::common::SelectiveStructColumnReader::next (this=0x7fdbe80147c0, numValues=2, result=
    std::shared_ptr<class facebook::velox::BaseVector> (use count 1, weak count 0) = {...}, incomingNulls=0x0)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/velox/dwio/common/SelectiveStructColumnReader.cpp:104
#20 0x00007fd9ad4faff1 in facebook::velox::parquet::ParquetRowReader::next (this=0x7fdbe801ed20, size=32768,
(gdb) f 8

   #8  0x00007fd9ad7b56df in facebook::velox::parquet::RleDecoder<false>::processRun<true, false, true, facebook::velox::dwio::common::StringDictionaryColumnVisitor<facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true> > (this=0x7fdbe802a440,
    rows=0x7fdbe802af00, rowIndex=0, currentRow=0, numRows=2, scatterRows=0x7fdbe802a700, filterHits=0x7fdbe802a1c0,
    values=0x7fdbe802a2c0, numValues=@0x7fdc8aef1bac: 0, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/RleDecoder.h:259
warning: Source file is more recent than executable.
259         super::decodeBitsLE(
(gdb) p this
$1 = (facebook::velox::parquet::RleDecoder<false> * const) 0x7fdbe802a440
(gdb) p *this
$2 = {<facebook::velox::dwio::common::IntDecoder<false>> = {
    _vptr.IntDecoder = 0x7fd9b478b608 <vtable for facebook::velox::parquet::RleDecoder<false>+16>,
    static kMinDenseBatch = <error reading variable: Missing ELF symbol "facebook::velox::dwio::common::IntDecoder<false>::kMinDenseBatch".>, inputStream = std::unique_ptr<facebook::velox::dwio::common::SeekableInputStream> = {get() = 0x0},
    bufferStart = 0x7fdbe801a178 '\241' <repeats 24 times>, "\255\336\335ں\335ںP\246\001\350\333\177",
    bufferEnd = 0x7fdbe801a178 '\241' <repeats 24 times>, "\255\336\335ں\335ںP\246\001\350\333\177", useVInts = false, numBytes = 0},
  bitWidth_ = 0 '\000', byteWidth_ = 0 '\000', bitMask_ = 0, lastSafeWord_ = 0x7fdbe801a170 "\002", remainingValues_ = 8, value_ = 14,
  bitOffset_ = 0 '\000', repeating_ = false}

relevant code is

    if (redZone > 0) {
      anyUnsafe = true;
      auto numRed = (redZone + 1) * 8 / bitWidth;
      int32_t lastSafeIndex = rows.back() - numRed;
      --numSafeRows;
      for (; numSafeRows >= 1; --numSafeRows) {
        if (rows[numSafeRows - 1] < lastSafeIndex) {
          break;
        }
      }
    }
(gdb) f 12
#12 0x00007fd9ad77409c in facebook::velox::parquet::PageReader::callDecoder<facebook::velox::dwio::common::ColumnVisitor<folly::Range<char const*>, facebook::velox::common::BytesRange, facebook::velox::dwio::common::ExtractToReader<facebook::velox::parquet::StringColumnReader>, true>, 0> (this=0x7fdbe802ac70, nulls=0x7fdbe802a5c0, nullsFromFastPath=@0x7fdc8aef1f48: true, visitor=...)
    at /mnt/DP_disk1/code/gluten/tools/build/velox_ep/./velox/dwio/parquet/reader/PageReader.h:196
196             rleDecoder_->readWithVisitor<true>(nulls, dictVisitor);
(gdb) p *this
$5 = {pool_ = @0x7fdbe801adb0, inputStream_ = std::unique_ptr<facebook::velox::dwio::common::SeekableInputStream> = {
    get() = 0x7fdbe802bfb0}, type_ = std::shared_ptr<const facebook::velox::parquet::ParquetTypeWithId> (use count 4, weak count 0) = {
    get() = 0x7fdbe8023d90}, maxRepeat_ = 0, maxDefine_ = 1, codec_ = facebook::velox::parquet::thrift::CompressionCodec::SNAPPY,
  chunkSize_ = 60, bufferStart_ = 0x7fdbe802b991 "", bufferEnd_ = 0x7fdbe802b991 "", tempNulls_ = {px = 0x0}, nullsInReadRange_ = {
    px = 0x0}, multiPageNulls_ = {px = 0x0}, repeatDecoder_ = std::unique_ptr<facebook::velox::parquet::RleDecoder<false>> = {
    get() = 0x0}, defineDecoder_ = std::unique_ptr<facebook::velox::parquet::RleDecoder<false>> = {get() = 0x7fdbe802a3e0},
  encoding_ = facebook::velox::parquet::thrift::Encoding::PLAIN_DICTIONARY, rowOfPage_ = 0, numRowsInPage_ = 2, pageBuffer_ = {px = 0x0},
  uncompressedData_ = {px = 0x7fdbe801a130}, pageData_ = 0x7fdbe801a176 "", dictionary_ = {values = {px = 0x7fdbe8017220}, strings = {
      px = 0x7fdbe802a350}, numValues = 1, sorted = false},
  dictionaryEncoding_ = facebook::velox::parquet::thrift::Encoding::PLAIN_DICTIONARY, pageStart_ = 60, pageDataStart_ = 50,
  encodedDataSize_ = 2, visitorRows_ = 0x7fdbe802af00, numVisitorRows_ = 2, initialRowOfPage_ = 0, currentVisitorRow_ = 2,
  firstUnvisited_ = 2, visitBase_ = 0, rowsCopy_ = 0x7fdbe8029b70, rowNumberBias_ = 0, nullConcatenation_ = {pool_ = @0x7fdbe801adb0,
    buffer_ = 0x0, numBits_ = 0, hasZeros_ = false}, dictionaryValues_ = std::shared_ptr<facebook::velox::BaseVector> (empty) = {
    get() = 0x0}, directDecoder_ = std::unique_ptr<facebook::velox::dwio::common::DirectDecoder<true>> = {get() = 0x0},
  rleDecoder_ = std::unique_ptr<facebook::velox::parquet::RleDecoder<false>> = {get() = 0x7fdbe802a440},
  stringDecoder_ = std::unique_ptr<facebook::velox::parquet::StringDecoder> = {get() = 0x0}}
(gdb) p this->pageData_
$6 = 0x7fdbe801a176 ""
(gdb) p *this->pageData_
$7 = 0 '\000'
@mbasmanova mbasmanova added the bug Something isn't working label Oct 12, 2022
@mbasmanova
Copy link
Contributor

@oerling Orri, would you check it out?

CC: @yingsu00 @majetideepak

@mbasmanova
Copy link
Contributor

@jinchengchenghh Thank you for reporting a crash in the Parquet reader. Next time, please, use 3 backticks to format blocks of code for readability. Looks like you have done some debugging. Would you like to send a PR with the fix?

@Yuhta Yuhta added the io label Oct 12, 2022
@mbasmanova
Copy link
Contributor

According to @oerling , this might be fixed in #2807

@jinchengchenghh
Copy link
Contributor Author

I verified locally, #2807 cannot fix this issue, I will try to reproduce by unit test and continue debugging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working io
Projects
None yet
Development

No branches or pull requests

4 participants