Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DuckDB query doesn't show updated results if the object changes elsewhere #5494

Closed
rmoff opened this issue Mar 15, 2023 · 1 comment · Fixed by #5536
Closed

[Bug]: DuckDB query doesn't show updated results if the object changes elsewhere #5494

rmoff opened this issue Mar 15, 2023 · 1 comment · Fixed by #5536
Labels
bug Something isn't working contributor team/versioning-engine Team versioning engine

Comments

@rmoff
Copy link
Contributor

rmoff commented Mar 15, 2023

What happened?

Current Behavior:

When you execute a query in the DuckDB pane of the object page and then change the underlying object, if you re-execute the query the results don't change.

Steps to Reproduce:

  1. Spin up the Docker Compose from https://github.com/treeverse/lakeFS/tree/docs/devex-173-quickstart/quickstart
  2. From http://127.0.0.1:8000/repositories/quickstart/object?ref=main&path=lakes.parquet run the default DuckDB query. Note the results

CleanShot 2023-03-15 at 16 13 40@2x

  1. Get a duckDB CLI prompt docker exec -it duckdb duckdb

  2. Load the parquet file as a table, delete some rows, and write it back to lakeFS

    SET s3_endpoint='lakefs:8000';
    SET s3_access_key_id='AKIA-EXAMPLE-KEY';
    SET s3_secret_access_key='EXAMPLE-SECRET';
    SET s3_url_style='path';
    SET s3_region='us-east-1';
    SET s3_use_ssl=false;
    
    CREATE TABLE lakes AS select * from read_parquet('s3://quickstart/main/lakes.parquet');
    DELETE FROM lakes WHERE country != 'Denmark';
    COPY lakes TO 's3://quickstart/main/lakes.parquet' (FORMAT 'PARQUET', ALLOW_OVERWRITE TRUE);
  3. Read the parquet file back directly to verify the change to the data:

    SELECT * 
    FROM read_parquet('s3://quickstart/main/lakes.parquet')
    LIMIT 20;
  4. In the same browser window as before, click Execute. Note that the data does not change. Even if you change the value on the LIMIT clause (e.g. from 20 to 5) the new data is not shown.

    Refresh the web page using the browser's controls and note that the correct data is now shown.

    CleanShot.2023-03-15.at.16.18.19-converted.mp4

Expected Behavior

When you run a query with DuckDB it should show the current data in the file.

If it is not going to do this then the UI should indicate very clearly that the data could be stale and have a button to force a refresh of it without requiring the user to reload the page (and thus lose their SQL query)

lakeFS Version

0.96.1

Deplyoment

Docker

Affected Clients

No response

Relevant logs output

No response

Contact Details

No response

@rmoff rmoff added bug Something isn't working contributor labels Mar 15, 2023
@nopcoder nopcoder added the team/versioning-engine Team versioning engine label Mar 20, 2023
@johnnyaug
Copy link
Contributor

Looks like this was done here: #4903. There is a trade-off between performance and data-freshness here and we decided to side with performance. However I agree that having no way to refresh the data is a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contributor team/versioning-engine Team versioning engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants