feat(api): time travel query #8517

mfatihaktas · 2024-03-01T21:17:20Z

Description of changes

Aims to address #8203.

This PR

Adds time_travel() for Table.
Adds ibis.expr.types.temporal.TimeTravelTable and ops.TimeTravelDatabaseTable.
Adds visit_TimeTravelDatabaseTable() in the compiler for BigQuery and Flink, and time_travel_table_sql() in the SQL generator for Flink.
Adds tests for time travel in BigQuery and Flink backends.

Notes:

Could not add a functional test for BigQuery as I do not have access to a GCP env to run the functional tests.
Will add more unit and functional tests.
- Especially, for timestamp expressions and timestamps with time zones.

Issues closed

cpcloud · 2024-03-01T21:30:36Z

compose.yaml

@@ -139,6 +139,8 @@ services:
      AZURE_ABFS_OAUTH_SECRET: ""
      AZURE_ABFS_OAUTH_ENDPOINT: ""
      AZURE_WASB_ACCESS_KEY: ""
+    ports:
+      - 9083:9083


Why was this necessary? If you're looking to expose the hive metastore to the flink container, do that by adding this container to the flink network, not by exposing the port to the host.

Thanks. I actually first tried adding networks: - flink for hive-metastore but that led to

E Caused by: java.net.ConnectException: Connection refused (Connection refused)

Digging deeper into this, I found that TestConfForStreaming (the TestConf used by the tests in ibis/backends/flink/tests/) creates a local Flink environment, which means the tests do not run on the dockerized environment. This is why we have to expose the meta-store to the local env.

I have put the env creation code in a new function get_table_env(remote_env: bool, streaming_mode: bool) to make it more transparent and configurable. Switched to remote env in TestConfForStreaming to see if we can run the tests without exposing the meta-store port, but that gave me

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 8065e44d3abe8da591f9c169cde351cd) ... Caused by: java.io.IOException: Failed to create the parent directory: /private/var/folders/9g/7ncp4wjj22q25spxdy7t44080000gn/T/pytest-of-mehmet/pytest-520/test_table0

The error seems to be due to the Flink client being on a file system that is not shared with the Flink TaskManager. I guess this is why the env was left to be local in TestConfForStreaming at the time.

cpcloud · 2024-03-01T21:33:58Z

ibis/backends/flink/tests/test_time_travel.py

+@pytest.fixture
+def use_hive_catalog(con):
+    # Flink related
+    download_jar_for_package(


Can you do this in the flink dockerfile?

Adding JAR files in the container in this case won't help because the Flink specific tests do not run on the dockerized env, but on the local env as described above.

Downloading JAR's is failing on CI:

package_name = 'apache-flink' jar_name = 'paimon-flink-1.18-0.8-20240301.002155-30' jar_url = 'https://repository.apache.org/content/groups/snapshots/org/apache/paimon/paimon-flink-1.18/0.8-SNAPSHOT/paimon-flink-1.18-0.8-20240301.002155-30.jar' def download_jar_for_package( package_name: str, jar_name: str, jar_url: str, ): import os from importlib import metadata import requests # Find the path to package lib try: distribution = metadata.distribution(package_name) lib_path = distribution.locate_file("") except metadata.PackageNotFoundError: lib_path = None # Check if the JAR already exists jar_path = os.path.join(lib_path, "pyflink/lib", f"{jar_name}.jar") if os.path.exists(jar_path): return jar_path # Download the JAR response = requests.get(jar_url, stream=True) if response.status_code != 200: > raise SystemError( f"Failed to download the JAR file \n" f"\t jar_url= {jar_url} \n" f"\t response.status_code= {response.status_code}" ) E SystemError: Failed to download the JAR file E jar_url= https://repository.apache.org/content/groups/snapshots/org/apache/paimon/paimon-flink-1.18/0.8-SNAPSHOT/paimon-flink-1.18-0.8-20240301.002155-30.jar E response.status_code= 404

The "easy" way to deal with this whole thing of downloading JAR's in Python would be skipping these tests by default and leaving them only for manual local testing.

ibis/expr/types/temporal.py

cpcloud

Can we drop all the flink-related changes and just implement this for bigquery and snowflake (and perhaps one other backend)?

cpcloud · 2024-09-30T11:17:42Z

Similar to #8412, I don't have time to maintain this code right now.

mfatihaktas self-assigned this Mar 1, 2024

mfatihaktas added the backends Issues related to all backends label Mar 1, 2024

cpcloud requested changes Mar 1, 2024

View reviewed changes

mfatihaktas requested a review from cpcloud March 5, 2024 19:17

chloeh13q mentioned this pull request Mar 6, 2024

meta: increase Flink's streaming backend coverage #8250

Closed

1 task

mfatihaktas commented Mar 8, 2024

View reviewed changes

ibis/expr/types/temporal.py Outdated Show resolved Hide resolved

mfatihaktas force-pushed the feat/support-time-travel-query branch 3 times, most recently from 38c7e9b to 87b050f Compare March 21, 2024 15:51

mfatihaktas marked this pull request as ready for review March 21, 2024 16:36

mfatihaktas force-pushed the feat/support-time-travel-query branch from 87b050f to 9304ad9 Compare April 4, 2024 14:03

feat(api): support time travel

31ec7f2

mfatihaktas force-pushed the feat/support-time-travel-query branch from 9304ad9 to 31ec7f2 Compare April 4, 2024 14:40

cpcloud added this to the 9.2 milestone Jun 13, 2024

cpcloud requested changes Jul 1, 2024

View reviewed changes

cpcloud removed this from the 9.2 milestone Jul 16, 2024

cpcloud closed this Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): time travel query #8517

feat(api): time travel query #8517

mfatihaktas commented Mar 1, 2024

cpcloud Mar 1, 2024

mfatihaktas Mar 2, 2024

cpcloud Mar 1, 2024

mfatihaktas Mar 5, 2024

mfatihaktas Mar 21, 2024

cpcloud left a comment

cpcloud commented Sep 30, 2024

feat(api): time travel query #8517

feat(api): time travel query #8517

Conversation

mfatihaktas commented Mar 1, 2024

Description of changes

Issues closed

cpcloud Mar 1, 2024

Choose a reason for hiding this comment

mfatihaktas Mar 2, 2024

Choose a reason for hiding this comment

cpcloud Mar 1, 2024

Choose a reason for hiding this comment

mfatihaktas Mar 5, 2024

Choose a reason for hiding this comment

mfatihaktas Mar 21, 2024

Choose a reason for hiding this comment

cpcloud left a comment

Choose a reason for hiding this comment

cpcloud commented Sep 30, 2024