You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the bug?
Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT which caused streaming job failure.
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://kmf-zero-etl-
demo/AWSLogs/aws-account-id=****/aws-service=vpcflowlogs/aws-region=us-east-2/year=2024/month=05/day=25/hour=05/****_vpcflowlogs_us-east-2_fl-*****.log.parquet. Column: [protocol], Expected: bigint, Found: INT32
at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:724)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:397)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:227)
What is the expected behavior?
VPC SQL statement definition should match the original VPC specifications:
Protocol column is INT32 in VPC doc but Athena create table uses BIGINT which I think our integration refers to
What is the bug?
Protocol column in VPC flow log parquet file is INT32, but Spark tried to read it as BIGINT which caused streaming job failure.
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://kmf-zero-etl-
What is the expected behavior?
VPC SQL statement definition should match the original VPC specifications:
Protocol column is INT32 in VPC doc but Athena create table uses BIGINT which I think our integration refers to
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: