Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite file growth when downloading checkpoints in chunks #129

Open
Mooon opened this issue Aug 30, 2024 · 0 comments
Open

Infinite file growth when downloading checkpoints in chunks #129

Mooon opened this issue Aug 30, 2024 · 0 comments

Comments

@Mooon
Copy link

Mooon commented Aug 30, 2024

I’m encountering an issue with the download script where it enters an infinite loop during the chunking process, resulting in files that grow indefinitely and the download never completes.

This happens when downloading big models, like the 405B-MP16 version, where each checkpoint (consolidated.XX.pth) is downloaded in chunks. The script should correctly download each chunk, concatenate them, and then complete the download process without entering an infinite loop. However, the script instead continuously downloads chunks without ever completing, causing the files to grow indefinitely in size.

Potential fix:
I was able to work around the issue by simplifying the process. Instead of downloading each consolidated.XX.pth file in chunks, I modified the script to download each file directly, without splitting it into chunks. Given that each checkpoint file is up to 48GB in size, this approach is manageable on systems with sufficient resources.

To implement this fix, set the variable PTH_FILE_CHUNK_COUNT=0. Additionally, I parallelized the downloads of the checkpoint files, which reduces the overall download time and simplifies the script.

Modified Script:

if [[ $PTH_FILE_COUNT -ge 0 ]]; then
    for s in $(seq -f "%02g" 0 ${PTH_FILE_COUNT}); do
        (
            printf "Downloading consolidated.${s}.pth\n"
            wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/consolidated.${s}.pth"
        ) &
    done

    # Wait for all file downloads to complete
    wait
fi   

I recognize that this solution may not be suitable for all users, particularly those on systems with limited resources. For this reason, it am opening the issue to consider alternative solutions or to provide additional options for users with different system capabilities.

@Mooon Mooon changed the title Chunking issue in the download script Infinite file growth when downloading checkpoints in chunks Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant