Skip to content

Data issue after the optimization of bloom_filter_length #3249

@wangyum

Description

@wangyum

Describe the bug, including details regarding any error messages, version, and platform.

Parquet version: From 1.14.0 to 1.15.2

There is a data issue after this improvement. The result is empty after this patch. My query is:

CREATE TEMPORARY VIEW parquetTable
USING org.apache.spark.sql.parquet
OPTIONS (
  path "viewfs://cluster/path/to/file/00000-62-eff62af9-bea8-4eb6-b9c9-c3ee29fd795e-0-00001.parquet"
);

SELECT *
FROM parquetTable
WHERE JNL_DEDUP_ID = 'PG.PG_CHARGE_ID.25030465176'
AND ACCT_ID = 4920012120302 AND SEQ_NUM = 734343586;

We build bloom filter on the column of JNL_DEDUP_ID using Parquet 1.13.1(Read data using Parquet 1.15.2), and it's data type is string type.

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions