Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlmesh re-runs backfills for lookback intervals when you re-apply the same plan #2985

Closed
sudokai opened this issue Aug 6, 2024 · 0 comments
Labels
Bug Something isn't working

Comments

@sudokai
Copy link

sudokai commented Aug 6, 2024

When I re-apply the same plan, a model with a custom cron expression (not @daily) should not re-run backfills for lookback intervals, but it does so.

I can reproduce it with this setup (BigQuery, sqlmesh 0.115.1):

model (
    name staging.test5
    , kind incremental_by_time_range (
        time_column (event_timestamp, '%Y-%m-%d %H:%M:%S.%f')
        , lookback 2
    )
    , partitioned_by timestamp_trunc(event_timestamp, day)
    , cron '0 12 * * *'
    , start '2024-08-01'
);

select
    event_timestamp::timestamp
from staging.seed
where
    event_timestamp >= @start_ts
    and event_timestamp < @end_ts
;
model (
    name staging.seed
    , kind seed (
        path '$root/seeds/events.csv'
    )
);

CSV test data:

event_timestamp,event_name,user_id
2024-08-06 00:00:00.000000 UTC,login,1234
2024-08-02 00:00:00.000000 UTC,login,1234
2024-08-04 00:00:00.000000 UTC,login,1236
2024-08-03 00:00:00.000000 UTC,login,1235
2024-08-05 00:00:00.000000 UTC,login,1237

The command sequence and outputs:

user@mac ~/D/sqlmesh-example [1]> sqlmesh plan -v                                   
New environment `prod` will be created from `prod`
Summary of differences against `prod`:
Models:
└── Added:
    ├── staging.seed
    └── staging.test5
Models needing backfill (missing dates):
├── staging.seed: 2024-08-05 - 2024-08-05
└── staging.test5: 2024-08-01 - 2024-08-04
Apply - Backfill Tables [y/n]: y
staging.seed created
staging.test5 created
Creating physical table ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:16

All model versions have been created successfully

[1/1] staging.seed evaluated in 9.57s
[1/1] staging.test5 evaluated in 17.24s
Evaluating models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:26                                                         
                                                                                                                                          

All model batches have been executed successfully

staging.seed promoted
staging.test5 promoted
Virtually Updating 'prod' ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 0:00:02

The target environment has been updated successfully

user@mac ~/D/sqlmesh-example> sqlmesh plan -v                                        
No differences when compared to `prod`
Models needing backfill (missing dates):
└── staging.test5: 2024-08-03 - 2024-08-04
Apply - Backfill Tables [y/n]: y
staging.test5 created
staging.seed created
Creating physical table ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:02

All model versions have been created successfully

[1/1] staging.test5 evaluated in 11.82s
Evaluating models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 1/1 • 0:00:11                                                         
                                                                                                                                          

All model batches have been executed successfully

Virtually Updating 'prod' ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0% • 0:00:14

The target environment has been updated successfully

user@mac ~/D/sqlmesh-example> sqlmesh plan -v                              
No differences when compared to `prod`
Models needing backfill (missing dates):
└── staging.test5: 2024-08-03 - 2024-08-04
Apply - Backfill Tables [y/n]:
@izeigerman izeigerman added the Bug Something isn't working label Aug 13, 2024
tobymao added a commit that referenced this issue Aug 13, 2024
refactor of how lookback works. the current implementation has some
weird interactions with cron that makes non standard crons and lookback
always backfill no matter what.

this refactors missing_intervals to only check for lookback when there
are any missing intervals to begin with, otherwise we don't bother with
lookback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants