Skip to content

[BUG] Can't save large pipeline #1318

Closed
1 task done
Mr-Geekman opened this issue Jul 18, 2023 · 0 comments · Fixed by #1335
Closed
1 task done

[BUG] Can't save large pipeline #1318

Mr-Geekman opened this issue Jul 18, 2023 · 0 comments · Fixed by #1335
Assignees
Labels
bug Something isn't working priority/high High priority task

Comments

@Mr-Geekman
Copy link
Contributor

🐛 Bug Report

There is an error during saving a very large pipeline that exceeds 4 GB.

Exception:

RuntimeError: File size unexpectedly exceeded ZIP64 limit

Expected behavior

No exception.

It can probably be fixed if during archive creation in Zipfile.open option force_zip64=True will be used.

How To Reproduce

It is an artificial example:

import numpy as np

from etna.models import CatBoostMultiSegmentModel
from etna.datasets import TSDataset, generate_ar_df
from etna.transforms import LagTransform
from etna.pipeline import Pipeline

HORIZON = 7


def main():
    df = generate_ar_df(n_segments=10, start_time="2020-01-01", periods=100)
    df_wide = TSDataset.to_dataset(df)
    ts = TSDataset(df=df_wide, freq="D")

    model = CatBoostMultiSegmentModel()
    model.some_attr = np.zeros((100_000, 10_000))
    transforms = [
        LagTransform(in_column="target", lags=list(range(HORIZON, 50)), out_column="lags")
    ]
    pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)

    pipeline.fit(ts)

    pipeline.save("fitted.zip")


if __name__ == "__main__":
    main()

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@Mr-Geekman Mr-Geekman added bug Something isn't working priority/high High priority task labels Jul 18, 2023
@Mr-Geekman Mr-Geekman self-assigned this Jul 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working priority/high High priority task
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant