You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m encountering an issue with the download script where it enters an infinite loop during the chunking process, resulting in files that grow indefinitely and the download never completes.
This happens when downloading big models, like the 405B-MP16 version, where each checkpoint (consolidated.XX.pth) is downloaded in chunks. The script should correctly download each chunk, concatenate them, and then complete the download process without entering an infinite loop. However, the script instead continuously downloads chunks without ever completing, causing the files to grow indefinitely in size.
Potential fix:
I was able to work around the issue by simplifying the process. Instead of downloading each consolidated.XX.pth file in chunks, I modified the script to download each file directly, without splitting it into chunks. Given that each checkpoint file is up to 48GB in size, this approach is manageable on systems with sufficient resources.
To implement this fix, set the variable PTH_FILE_CHUNK_COUNT=0. Additionally, I parallelized the downloads of the checkpoint files, which reduces the overall download time and simplifies the script.
Modified Script:
if [[ $PTH_FILE_COUNT-ge 0 ]];thenforsin$(seq -f "%02g" 0 ${PTH_FILE_COUNT});do
(
printf"Downloading consolidated.${s}.pth\n"
wget --continue ${PRESIGNED_URL/'*'/"${MODEL_PATH}/consolidated.${s}.pth"} -O ${TARGET_FOLDER}"/${MODEL_PATH}/consolidated.${s}.pth"
) &done# Wait for all file downloads to completewaitfi
I recognize that this solution may not be suitable for all users, particularly those on systems with limited resources. For this reason, it am opening the issue to consider alternative solutions or to provide additional options for users with different system capabilities.
The text was updated successfully, but these errors were encountered:
Mooon
changed the title
Chunking issue in the download script
Infinite file growth when downloading checkpoints in chunks
Aug 30, 2024
I’m encountering an issue with the download script where it enters an infinite loop during the chunking process, resulting in files that grow indefinitely and the download never completes.
This happens when downloading big models, like the 405B-MP16 version, where each checkpoint (
consolidated.XX.pth
) is downloaded in chunks. The script should correctly download each chunk, concatenate them, and then complete the download process without entering an infinite loop. However, the script instead continuously downloads chunks without ever completing, causing the files to grow indefinitely in size.Potential fix:
I was able to work around the issue by simplifying the process. Instead of downloading each
consolidated.XX.pth
file in chunks, I modified the script to download each file directly, without splitting it into chunks. Given that each checkpoint file is up to 48GB in size, this approach is manageable on systems with sufficient resources.To implement this fix, set the variable
PTH_FILE_CHUNK_COUNT=0
. Additionally, I parallelized the downloads of the checkpoint files, which reduces the overall download time and simplifies the script.Modified Script:
I recognize that this solution may not be suitable for all users, particularly those on systems with limited resources. For this reason, it am opening the issue to consider alternative solutions or to provide additional options for users with different system capabilities.
The text was updated successfully, but these errors were encountered: