Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. #3841

Merged
merged 1 commit into from
Nov 18, 2023

Conversation

shuchu
Copy link
Collaborator

@shuchu shuchu commented Nov 17, 2023

What this PR does / why we need it:
Update the Pyarrow to the latest version v14.0.1 which has the fix for GHSA-5wvp-7f3h-6wmm

Which issue(s) this PR fixes:
Fixes # 3832

  1. Before this PR, the Pyarrow version is v10.0.1 for Feast. It's default parquet format version is v2.4 for the function "pyarrow.parquet.write_table()". Check the "version" argument.
  2. With parquet format v2.4, this function will change the timestamps's resolution (check the doc of argument "coerce_timestamps":
    "By default, for version='1.0' (the default) and version='2.4', nanoseconds are cast to microseconds (‘us’),"
  3. After upgrading the Feast with "pyarrow==14.0.1", the "version" parameter is set to "2.6". As a result, the datatype of timestamp will be "datetime[ns]" instead of "datetime[us]" as before. This change created a problem while writing a Pyarrow table to Google BigQuery and AWS Redshift. From my debugging they show a kind of Value error for columns with "datetime" type.
  4. I explicitly set the argument "coerce_timestamps" to "us" to allow the write_table() to have the same behavior about "timestamp" as before w.r.t this Pyarrow version upgrade.
  5. Not all function calls of "pyarrow.parquet.write_table()" are upgraded. Here is the list of founded function calls:

./python/feast/transformation_server.py:54: writer.write_table(result_arrow) ./python/feast/infra/offline_stores/file.py:109: pyarrow.parquet.write_table( ./python/feast/infra/offline_stores/file.py:470: writer.write_table(new_table) ./python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py:236: pq.write_table(table, tmp_file.name) ./python/feast/infra/offline_stores/contrib/mssql_offline_store/mssql.py:373: pyarrow.parquet.write_table( ./python/feast/infra/offline_stores/bigquery.py:358: pyarrow.parquet.write_table(table=data, where=parquet_temp_file, coerce_timestamps="us") ./python/feast/infra/offline_stores/bigquery.py:407: pyarrow.parquet.write_table(table=table, where=parquet_temp_file, coerce_timestamps="us") ./python/feast/infra/utils/aws_utils.py:207: pq.write_table(table, file_path) ./python/feast/infra/utils/aws_utils.py:356: pq.write_table(table, parquet_temp_file, coerce_timestamps="us") ./python/feast/infra/utils/aws_utils.py:1049: pq.write_table(table, parquet_temp_file)

I try to keep a minimum change. If there is an error shows up in the future, for example, the "upload_arrow_table_to_athena()" function of "aws_utils.py:1049", a new PR can be created with the necessary unit tests and integration tests.

Signed-off-by: Shuchu Han <shuchu.han@gmail.com>
@shuchu shuchu changed the title fix: upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. fix: Upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. Nov 17, 2023
@achals achals merged commit 052182b into feast-dev:master Nov 18, 2023
31 of 33 checks passed
@cburroughs cburroughs mentioned this pull request Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants