[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167

YANG-DB · 2024-06-28T03:45:44Z

What is the bug?
Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT which caused streaming job failure.

org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://kmf-zero-etl-

demo/AWSLogs/aws-account-id=****/aws-service=vpcflowlogs/aws-region=us-east-2/year=2024/month=05/day=25/hour=05/****_vpcflowlogs_us-east-2_fl-*****.log.parquet. Column: [protocol], Expected: bigint, Found: INT32
    at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:724)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:397)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:227)

What is the expected behavior?
VPC SQL statement definition should match the original VPC specifications:

Protocol column is INT32 in VPC doc but Athena create table uses BIGINT which I think our integration refers to

Do you have any additional context?
Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

YANG-DB added bug Something isn't working untriaged labels Jun 28, 2024

YANG-DB mentioned this issue Jun 28, 2024

[BUG] fix vpc protocol field to match vpc original declaration #168

Merged

YANG-DB removed the untriaged label Jul 3, 2024

YANG-DB added the content integration / getting-started content label Aug 1, 2024

YANG-DB closed this as completed Aug 1, 2024

YANG-DB added this to catalog Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167

[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167

YANG-DB commented Jun 28, 2024

[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167

[BUG]Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT #167

Comments

YANG-DB commented Jun 28, 2024