Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input_images must be a zip file error when running a training #338

Closed
tobias-varden opened this issue Aug 22, 2024 · 18 comments · Fixed by #343
Closed

input_images must be a zip file error when running a training #338

tobias-varden opened this issue Aug 22, 2024 · 18 comments · Fixed by #343

Comments

@tobias-varden
Copy link

I am trying to create a Flux lora using https://replicate.com/ostris/flux-dev-lora-trainer/train.
However it is not accepting the zip file I provide, stating that it's not a zip file. To make sure it's a zip file (even though I know it) I also checked this on my end by using zipfile library.

Model created: tobias-varden/test-fifth-lora
<_io.BufferedReader name='C:\temp\dreambooth\dreambooth.zip'>
Training started: starting
Training URL: https://replicate.com/p/xxxxxxxxxxxxxxxxx
Training status: failed
Training failed or was canceled. Status: failed
Training logs: Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/worker.py", line 354, in _predict
result = predict(**payload)
File "/src/train.py", line 127, in train
extract_zip(input_images, INPUT_DIR)
File "/src/train.py", line 287, in extract_zip
raise ValueError("input_images must be a zip file")
ValueError: input_images must be a zip file

Currently using replicate==0.31.0 version.
I am on Windows 11.
The code:

    with open(image_path, "rb") as f:
        print(f)
        training = replicate.trainings.create(
            version="ostris/flux-dev-lora-trainer:4ffd32160efd92e956d39c5338a9b8fbafca58e03f791f6d8011f3e20e8ea6fa",
            input={
                "input_images": f,
                "steps": 1000,
                "prefix": f"A photo of {token}, "
            },
            destination=f"{model.owner}/{model.name}"
        )
@tobias-varden
Copy link
Author

tobias-varden commented Aug 22, 2024

The image path is with double slashes \\ but it's not showing up for some reason.

@nikitalokhmachev-ai
Copy link

Same issue here!

@mattt
Copy link
Contributor

mattt commented Aug 23, 2024

@tobias-varden @nikitalokhmachev-ai I just released a new version of the Python client yesterday that adds support for the new files API. Can you try upgrading to 0.32.0 and trying again?

@nikitalokhmachev-ai
Copy link

@mattt I've tried this new version, I can see the change in the UI that the .zip archive is now being uploaded but I am still getting the same error

@RikNieuwoudt
Copy link

I'm getting the same error using the latest version as well.

@tobias-varden
Copy link
Author

tobias-varden commented Aug 24, 2024

@tobias-varden @nikitalokhmachev-ai I just released a new version of the Python client yesterday that adds support for the new files API. Can you try upgrading to 0.32.0 and trying again?

Hi Matt, thanks for the update! I updated to 0.32.0 package, but I still get the same issue.

@tobias-varden
Copy link
Author

I see the new code, do I need to upload the file before starting the training and refer to the uploaded file somehow in the input_images ?

@RikNieuwoudt
Copy link

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

@DylanDDeng
Copy link

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

@ayakut16
Copy link

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

https://developers.cloudflare.com/r2/

@RikNieuwoudt
Copy link

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

https://developers.cloudflare.com/r2/

Yep, Cloudflare R2. S3 should work too. Or hosting it on your own somewhere.
In other words, passing in a URL works over passing files directly

@mattt
Copy link
Contributor

mattt commented Aug 26, 2024

Apologies for the inconvenience, folks. I can confirm that the issues are a result of incorrect validation logic in the model that looks for a .zip file extension. I've opened an upstream PR. In the meantime, please try passing a file handle (open("path/to/file.zip") with a .zip extension, or if that fails, use the suggested workaround to upload to S3 or R2.

@fa9r
Copy link

fa9r commented Aug 26, 2024

@mattt FYI I get a similar error with other replicate models since the last release when running a model with local .jpg inputs by using local files as inputs: replicate.exceptions.ModelError: Please provide png, jpg or jpeg images.

Edit: Pinning replicate to an older version fixed the issue for me.

mattt added a commit that referenced this issue Aug 30, 2024
Resolves #338 

Follow up to #226 

Filenames from open file handles were not being passed correctly in
`create` / `async_create`. This PR fixes that and adds some test
coverage.

---------

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
@mattt mattt reopened this Aug 30, 2024
@mattt
Copy link
Contributor

mattt commented Aug 30, 2024

Hey everyone. Thanks again for your feedback and patience. I found a problem in how file uploads work when passing a file handle that caused filenames to not be passed correctly. This was fixed by #343, and is now available in 0.32.1. Updating to that version should sort out the problems y'all are seeing. (If not, please let me know!)

@mattt mattt closed this as completed Aug 30, 2024
@tobias-varden
Copy link
Author

Thanks @mattt this works for me now!

@dw820
Copy link

dw820 commented Sep 11, 2024

Hi @mattt I am still getting this error and my replicate version is 0.32.1

Here's the code i used to zip the folder of images

def zip_folder(folder_path, output_zip):
    with zipfile.ZipFile(output_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:
                file_path = os.path.join(root, file)
                zipf.write(file_path, os.path.relpath(file_path, folder_path))

folder_to_zip = './data'
output_zip_file = 'data.zip'

zip_folder(folder_to_zip, output_zip_file)

and call the replicate.trainings.create

training = replicate.trainings.create(
    version="stability-ai/sdxl:xxx",
    input={
        "input_images": open("data.zip","rb"),
        "token_string": "TOK",
        "caption_prefix": "a photo of TOK, "
    },
    destination=f"{model.owner}/{model.name}"
)

After that I got an input_images link from replicate, I can download the exact zip file I have locally, but still get this error.
Screenshot 2024-09-10 at 7 52 45 PM

Here's the full error log

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/cog/server/worker.py", line 354, in _predict
result = predict(**payload)
File "train.py", line 142, in train
input_dir = preprocess(
File "/src/preprocess.py", line 118, in preprocess
assert False, "input_images_filetype must be zip or tar"
AssertionError: input_images_filetype must be zip or tar

@mattt
Copy link
Contributor

mattt commented Sep 11, 2024

Hi @dw820. That looks like a model-specific problem. The original issue had to do with https://replicate.com/ostris/flux-dev-lora-trainer/train (which is quite a big step up from SDXL, so it'd be worth your while to check it out!)

@dw820
Copy link

dw820 commented Sep 12, 2024

Got it, thanks for the context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants