Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when using CDF functionality in dev branch #8

Open
januz opened this issue Aug 22, 2023 · 5 comments
Open

error when using CDF functionality in dev branch #8

januz opened this issue Aug 22, 2023 · 5 comments

Comments

@januz
Copy link

januz commented Aug 22, 2023

@zacdav-db Thanks so much for this package!

I first tried out the standard functionality that is currently available in main which works great. But as I also wanted to test the change data feed functionality and installed the dev version.

I'm getting the following error message when trying to load data:

> ds_tbl_cdf$set_cdf_options(starting_version = 1)
> ds_tbl_cdf_tibble <- ds_tbl_cdf$load_tibble(changes = TRUE)
deleting 1 files that are no longer referenced
Error: rapi_prepare: Failed to prepare query create or replace table '/var/folders/bs/m184ytk15hddvjmg4zqrq8rh0000gq/T//RtmptNIogA/8e685855-71f0-43a3-9f6f-e176a36655f7/_table_changes/' as
with changes as (
  select *
  from read_parquet('/var/folders/bs/m184ytk15hddvjmg4zqrq8rh0000gq/T//RtmptNIogA/8e685855-71f0-43a3-9f6f-e176a36655f7/_table_changes//*__cdf_*', filename=true)
),
other as (
  select * exclude (filename), null as _change_type, filename
  from read_parquet('/var/folders/bs/m184ytk15hddvjmg4zqrq8rh0000gq/T//RtmptNIogA/8e685855-71f0-43a3-9f6f-e176a36655f7/_table_changes//*__[i|d]*_*', filename=true)
),
all_data as (
  select * from changes
  union
  select * from other
),
dataset as (
  select
    *,
    str_split(regexp_extract(filename, '.*\/(.*)\..*', 1), '_') as metadata,
    to_timestamp(metadata[5]::bigint/1000)::string as _change_timestamp,
    metadata[6]::int as _change_version,
    from all_data
)
select
  *
  exclude (filename, metadata)
  replace (coalesce(_chang
@zacdav-db
Copy link
Owner

Hey @januz, thanks for the feedback.

Currently dev branch differs substantially from main due to changing the underying method.
I'm aware of a number of issues I need to resolve to fix this and am currently thinking through how I want to implement changes so things are robust.

Predominantly the dev branch is not robust to nesting of partitions to arbitrary levels etc, relatively easy to add once I've decided on the direction.

Hopefully I'll find the time in the coming weeks to add this!

@januz
Copy link
Author

januz commented Aug 22, 2023

Thanks for the background, @zacdav-db

I'll check back in a few weeks to test your new implementation when it's ready. It would be fantastic if you could keep this issue open and update it once you have implemented these changes. Thanks so much!

@januz
Copy link
Author

januz commented Nov 14, 2023

@zacdav-db Sorry to bug you again with this but are you still working on adding the CDF functionality? We're about to set up an ETL pipeline for regularly importing data from one of our vendors who are sharing the data using Deltasharing and the CDF functionality would make it really easy to keep their and our database tables in sync without having to re-download lots of data. Thank you!!

@zacdav-db
Copy link
Owner

@januz If you need anything for important use cases in the near term I'd suggest not waiting on this, I cannot make any concrete commitments to timelines.

The python connector should support your use case.

With regards to direction I have decided that I will likely try a first implementing the delta kernel to provide this functionality. One downside is that this is in java and I may choose to wait for the rust implementation. If the performance of the connector using the kernel is poor I'll likely revisit the more complex and labour intensive effort of doing it all myself.

@januz
Copy link
Author

januz commented Nov 15, 2023

Thanks for the update, @zacdav-db!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants