-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api): time travel query #8517
feat(api): time travel query #8517
Conversation
@@ -139,6 +139,8 @@ services: | |||
AZURE_ABFS_OAUTH_SECRET: "" | |||
AZURE_ABFS_OAUTH_ENDPOINT: "" | |||
AZURE_WASB_ACCESS_KEY: "" | |||
ports: | |||
- 9083:9083 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this necessary? If you're looking to expose the hive metastore to the flink container, do that by adding this container to the flink network, not by exposing the port to the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I actually first tried adding networks: - flink
for hive-metastore
but that led to
E Caused by: java.net.ConnectException: Connection refused (Connection refused)
Digging deeper into this, I found that TestConfForStreaming
(the TestConf
used by the tests in ibis/backends/flink/tests/
) creates a local Flink environment, which means the tests do not run on the dockerized environment. This is why we have to expose the meta-store to the local env.
I have put the env creation code in a new function get_table_env(remote_env: bool, streaming_mode: bool)
to make it more transparent and configurable. Switched to remote env in TestConfForStreaming
to see if we can run the tests without exposing the meta-store port, but that gave me
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 8065e44d3abe8da591f9c169cde351cd)
...
Caused by: java.io.IOException: Failed to create the parent directory: /private/var/folders/9g/7ncp4wjj22q25spxdy7t44080000gn/T/pytest-of-mehmet/pytest-520/test_table0
The error seems to be due to the Flink client being on a file system that is not shared with the Flink TaskManager
. I guess this is why the env was left to be local in TestConfForStreaming
at the time.
@pytest.fixture | ||
def use_hive_catalog(con): | ||
# Flink related | ||
download_jar_for_package( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this in the flink dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding JAR files in the container in this case won't help because the Flink specific tests do not run on the dockerized env, but on the local env as described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downloading JAR's is failing on CI:
package_name = 'apache-flink'
jar_name = 'paimon-flink-1.18-0.8-20240301.002155-30'
jar_url = 'https://repository.apache.org/content/groups/snapshots/org/apache/paimon/paimon-flink-1.18/0.8-SNAPSHOT/paimon-flink-1.18-0.8-20240301.002155-30.jar'
def download_jar_for_package(
package_name: str,
jar_name: str,
jar_url: str,
):
import os
from importlib import metadata
import requests
# Find the path to package lib
try:
distribution = metadata.distribution(package_name)
lib_path = distribution.locate_file("")
except metadata.PackageNotFoundError:
lib_path = None
# Check if the JAR already exists
jar_path = os.path.join(lib_path, "pyflink/lib", f"{jar_name}.jar")
if os.path.exists(jar_path):
return jar_path
# Download the JAR
response = requests.get(jar_url, stream=True)
if response.status_code != 200:
> raise SystemError(
f"Failed to download the JAR file \n"
f"\t jar_url= {jar_url} \n"
f"\t response.status_code= {response.status_code}"
)
E SystemError: Failed to download the JAR file
E jar_url= https://repository.apache.org/content/groups/snapshots/org/apache/paimon/paimon-flink-1.18/0.8-SNAPSHOT/paimon-flink-1.18-0.8-20240301.002155-30.jar
E response.status_code= 404
The "easy" way to deal with this whole thing of downloading JAR's in Python would be skipping these tests by default and leaving them only for manual local testing.
38c7e9b
to
87b050f
Compare
87b050f
to
9304ad9
Compare
9304ad9
to
31ec7f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop all the flink-related changes and just implement this for bigquery and snowflake (and perhaps one other backend)?
Similar to #8412, I don't have time to maintain this code right now. |
Description of changes
Aims to address #8203.
This PR
time_travel()
forTable
.ibis.expr.types.temporal.TimeTravelTable
andops.TimeTravelDatabaseTable
.visit_TimeTravelDatabaseTable()
in the compiler for BigQuery and Flink, andtime_travel_table_sql()
in the SQL generator for Flink.Notes:
Issues closed