-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model downloads just *hang* #1186
Comments
To expand on this further, I restarted the download for |
Are you sure you have enough space in the path linked to |
@OlivierDehaene There is no space issues (over 1T free). Is it possible to enable logging so we can see the error? |
The downloader is definitely flakey... When downloading We then killed and restarted the docker process and of course it re-downloaded the first file, and then another copy of the second file, then hung again at 4.3G. We then deleted the tmp files and re-ran the download, and it worked correctly... This morning we tried downloading We disabled the HF_HUB_ENABLE_HF_TRANSFER
but still don't see any logging. It's impossible for us to debug this when we can't the errors. Please help. |
@vgoklani hey were you able to fix this ? facing similar issue |
@StephennFernandes Something is definitely flakey, unfortunately we can't see the logging. One thing to do is disable the HF_HUB_ENABLE_HF_TRANSFER:
and then when the downloads get stuck, kill the docker process, delete the tmp* files and just restart. Do this several times and just cross your fingers and hope it eventually finishes successfully. The problem is most likely server-side, but unfortunately its hard to be sure since we can't see the logs. |
Any updates on this @OlivierDehaene I know your team is busy, but we're trying to download this:
the first file downloads successfully in the temp directory, we see 6.4G and then nothing happens. The second file does not start downloading, and there are no updates on the console. We then kill the download and restart, it then re-downloads the first file in a new temp folder, and then just hangs again. Could you please help? We removed the AWQ file and went to the full 33B model, but the same thing keeps happening. |
We are seeing the same issue with
The process output
Then run strace to see the last output of the download-weights command
Always hangs at the exact same spot
Update: |
Cheers, are you all behind a corporate proxy? We are issuing the same problem (behind a corporate proxy), maybe the proxy is the problem. |
nope, tested in multiple environments! |
Same here. no activity from docker container version: "3.8"
services:
hf-inference:
restart: always
image: ghcr.io/huggingface/text-generation-inference:1.1.1
command: --model-id TheBloke/deepseek-coder-1.3B-base-AWQ
volumes:
- "./data:/data"
ports:
- 8080:80
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu] model has already downloaded in a temp file (854MB of 895MB downloaded) |
@VfBfoerst No, I use plain internet without proxy, and I got the same issue |
hi, any solution yet? |
We are never able to reproduce. Can you provide more information about what environment you are running in ? HF_HUB_ENABLE_HF_TRANSFER=0 AND =1 (in the docker environment). It really looks like some kind of environment issue, but we really cannot help if we cannot reproduce. |
@Narsil Reproducing this issue is quite challenging and problem doesn't always occur (especially if you have a high-quality internet connection). |
The downloads are indeed We don't have any issues using |
We deploy many models per day without any hitch, ever. Everything is using I'm sorry but I really cannot help with the problems in your environment. At this point just try to isolate the problem so we can actually help. Lauching I'm seriously expecting a bug in the environment. Something limiting the descriptors, some hard CPU rate limiting of the host, some firewall issue. |
Hey, I found a workaround (at least in my case). It detected the already downloaded weights and started the webserver. After that, the generation via API worked as expected. |
I was watching this video on Mixtral where the speaker is discussing TGI and then he says "well there are some issues downloading weights when using TGI" ... https://youtu.be/EXFbZfp8xCI?si=TEv3dvWgI3hZKwWJ&t=928 He's running this on RunPod, which i believe has a "very stable network" |
I'm experiencing the same issue on certain networks with MODEL_ID=TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ.
Hangs forever
If I invoke |
Encountering the same issue while waiting for Mixtral to download without any feedback. It keeps stuck in the download phase |
@Narsil I can reliably reproduce this on Runpod with
Please give it a try. Also, like peterschmidt85, I find that the value of the |
@Narsil I'm also experiencing this issue, for downloading Mixtral-8x7B-Instruct-v0.1 (both the original, and also a duplicate I made on HF), under Kubernetes on Azure using the ghcr.io/huggingface/text-generation-inference:1.3.4 image. Let me know if you want a copy of the Kubernetes yaml files — it's extremely replicable. That model has 19 ~4.9GB .safetensor chunks, the download consistently hangs after either 12, 13, or 14 chunks. I have exec-ed to the pod and confirmed that the downloader process is still running (it eventually times out around 20 minutes later), and that the machine has plenty of disk space. Looking in the downloaded blobs directory, it appears to have all full-sized chunks with no partial chunk, so possibly the hang is during the process of starting the download of the next chunk. I found #354 from back in May which sounded very similar. A poster on that issue thread claimed to have worked around the issue by passing the option
without success. I also confirmed that if I exec into the pod and manually run the downloader command:
then that completes just fine, and the text-generation-loader spawns another weights download, which picks up the downloaded model. So now I have a running pod that I can only restart or scale via a slow manual process, thus defeating the entire point of Kubernetes. This issue has already been open for 2 1/2 months, and is a blocker, sadly, so I'll be switching us over to using vLLM (yes, I searched, and they don't have a similar issue reported). |
@RDearnaley I wouldn't say that the issue is "extremely replicable". We have yet to see a reproducible example and we do not have the issue in any of our prod environements, our integration tests or even our dev VMs. It's extremely hard to make progress on this issue in these conditions. Even in this thread, we have seen numerous talented people trying to reproduce/find out what is hapenning and nobody has a clue yet besides @peterschmidt85 who found that it might be linked to temporary files. For example you state:
The launcher runs the same command.
Sassy comments do not ever lead to magic fixes, it just leads to open source dev burnout. |
@RDearnaley I wouldn't say that the issue is "extremely replicable"
What I meant by:
Let me know if you want a copy of the Kubernetes yaml files — it's extremely replicable.
is that it's extremely replicable *in our Azure Kubernetes pod*, which is why I was asking if you wanted a copy of our Kubernetes script, since I already understood that you have so far been unable to replicate it.
The launcher runs the same command.
I'm aware of that (I said "manually run the downloader command", perhaps I should have been more explicit and said "copy-paste the downloader command from ps -elf and simply rerun that myself from the command line"), and also of the obvious implication that this is a Heisenbug. FWIW I believe I was running this as root. *I did notice that running it manually showed progress bars, which I didn't see in the logs of Kubernetes running it.* I was hoping that this might be a clue for you.
I regret that my comment explaining my company's situation of being unable to continue using your product apparently came over as "sassy" to you — that was not my intent. I was simply attempting to express that this is a complete blocker for us, and is unfortunately forcing use to stop using your product. I am hoping that the day-plus of work I put in on trying to help debug this difficult issue in your product, and the results of that which I included above, will be of some help.
…On Tue, Jan 9, 2024 at 7:18 AM OlivierDehaene ***@***.***> wrote:
@RDearnaley <https://github.com/RDearnaley> I wouldn't say that the issue
is "extremely replicable". We have yet to see a reproducible example and we
do not have the issue in any of our prod environements, our integration
tests or even our dev VMs. It's extremely hard to make progress on this
issue in these conditions.
Even in this thread, we have seen numerous talented people trying to
reproduce/find out what is hapenning and nobody has a clue yet besides
@peterschmidt85 <https://github.com/peterschmidt85> who found that it
might be linked to temporary files.
For example you state:
I also confirmed that if I exec into the pod and manually run the
downloader command then that completes just fine.
The launcher runs the same command.
So now I have a running pod that I can only restart or scale via a slow
manual process, thus defeating the entire point of Kubernetes.
so I'll be switching us over to using vLLM (yes, I searched, and they
don't have a similar issue reported).
Sassy comments do not ever lead to magic fixes, it just leads to open
source dev burnout.
Be aware that they also use the huggingface_hub lib to download the
weights.
—
Reply to this email directly, view it on GitHub
<#1186 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHDCWLSND7LI3CXZVFLJGS3YNVNTLAVCNFSM6AAAAAA6K5APMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBTGI2DINZRGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The sass was definitely uncalled for. @OlivierDehaene @Narsil Here is a step by step replication which always produces the infinite "hang" for me, and I've tried it probably a dozen times with different machines on Runpod, including Community Cloud and Secure Cloud. This is exactly what I did several times a week ago, and I've just tested these steps again today on two different runpod machines (one in Community, and one in Secure).
Here's a video of that process: simplescreenrecorder-2024-01-13_19.34.44.mp4And then click the "Pods" page via the button in the side bar and the machine will be listed. Once it starts up, click the logs button and go to the "Container Logs" tab. You'll see it either:
So far as I can see this is 100% replicable with the above steps. That is of course not to say that this makes it easy to fix, but I hope this helps the investigation somewhat. |
I have a simple setup for my personal tests: no K8, local storage, no proxy, and basic Docker, running TGI with several 70B models and many smaller ones. The downloads generally work smoothly. However, occasionally, I encounter a model with one large
These are my attempts to download a model that has one 36GB PS: I forgot to share my personal workarounds. If I use that download via |
Maybe this comment isn't very useful as I don't have any new insights to add. |
I've been struggling with this issue for a while now, getting very similar errors and no problem downloading via it was the following (facepalm incoming, but hope it helps others): I blindly copied over I still got some unclear hangs, but restart gets me a bit further, so downloading bit by bit but at least it gets me started :) |
I totally agree with this suggestion from @vgoklani. Seams this issue won't be resolved anytime soon, so it's really helpful to be able to serve models from a pre-downloaded model directory. without having to use |
The culpritI think I've tracked this down, and it's not at all what I expected. I've tracked it down to the progress bar in huggingface_hub: What I didLike others have mentioned, I ran With the magic of print debugging, I added my own debug statements and wrote them out to a file. Here's a diff of my changes: diff --git a/src/huggingface_hub/file_download.py b/src/huggingface_hub/file_download.py
index abc82e1..428afe1 100644
--- a/src/huggingface_hub/file_download.py
+++ b/src/huggingface_hub/file_download.py
@@ -18,6 +18,11 @@ from pathlib import Path
from typing import Any, BinaryIO, Dict, Generator, Literal, Optional, Tuple, Union
from urllib.parse import quote, urlparse
+def write_debug(msg: str):
+ with open("/data/debug-log.txt", "a") as f:
+ f.write(msg)
+ f.flush()
+
import requests
from filelock import FileLock
@@ -375,6 +380,7 @@ def _raise_if_offline_mode_is_enabled(msg: Optional[str] = None):
def _request_wrapper(
method: HTTP_METHOD_T, url: str, *, follow_relative_redirects: bool = False, **params
) -> requests.Response:
+ write_debug(f"{method}: {url}\n")
"""Wrapper around requests methods to add several features.
What it does:
@@ -422,7 +428,12 @@ def _request_wrapper(
return response
# Perform request and return if status_code is not in the retry list.
- response = get_session().request(method=method, url=url, **params)
+ write_debug("calling get_session()\n")
+ s = get_session()
+ write_debug("finished get_session()\n")
+ write_debug("Making request\n")
+ response = s.request(method=method, url=url, **params)
+ write_debug("finished request\n")
hf_raise_for_status(response)
return response
@@ -460,6 +471,7 @@ def http_get(
" available in your environment. Try `pip install hf_transfer`."
)
+ #import pudb; pu.db
initial_headers = headers
headers = copy.deepcopy(headers) or {}
if resume_size > 0:
@@ -512,6 +524,7 @@ def http_get(
"using `pip install -U hf_transfer`."
)
try:
+ write_debug("before hf_transfer.download()\n")
hf_transfer.download(
url=url,
filename=temp_file.name,
@@ -522,7 +535,9 @@ def http_get(
max_retries=5,
**({"callback": progress.update} if supports_callback else {}),
)
+ write_debug("after hf_transfer.download()\n")
except Exception as e:
+ write_debug("hf_transfer.download() runtime error\n")
raise RuntimeError(
"An error occurred while downloading using `hf_transfer`. Consider"
" disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling."
@@ -538,14 +553,24 @@ def http_get(
return
new_resume_size = resume_size
try:
+ write_debug("before chunk iter\n")
for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
+ write_debug("got chunk\n")
+ write_debug(f"len(chunk): {len(chunk)}\n")
+ write_debug("updating progress\n")
progress.update(len(chunk))
+ write_debug("progress updated\n")
temp_file.write(chunk)
+ write_debug("temp_file.write() returned\n")
new_resume_size += len(chunk)
# Some data has been downloaded from the server so we reset the number of retries.
_nb_retries = 5
+ else:
+ write_debug("not chunk!!\n")
+ write_debug("after chunk iter\n")
except (requests.ConnectionError, requests.ReadTimeout) as e:
+ write_debug(f"Chunk iter error {e}\n")
# If ConnectionError (SSLError) or ReadTimeout happen while streaming data from the server, it is most likely
# a transient error (network outage?). We log a warning message and try to resume the download a few times
# before giving up. Tre retry mechanism is basic but should be enough in most cases.
@@ -1457,6 +1482,8 @@ def hf_hub_download(
_check_disk_space(expected_size, os.path.dirname(blob_path))
if local_dir is not None:
_check_disk_space(expected_size, local_dir)
+ else:
+ write_debug("Warning! Expected_size is None\n")
http_get(
url_to_download, I then ran the modified container with the debugging output and monitored both the temporary file size and the output file. I was able to see that the HypothesisIt's my hypothesis that, because the docker container is not displaying stdout, the Why this might make sense
Why this might not make sense
UPDATE!To further test this hypothesis, I added the following loop just before the download iterates: @@ -538,14 +553,29 @@ def http_get(
return
new_resume_size = resume_size
try:
+ write_debug("Trying to fill buffers...")
+ for i in range (0, 10_000_000):
+ write_debug(f"progress: {i}\n")
+ print("a"*64)
+ print("a"*64, file=sys.stderr)
+ write_debug("before chunk iter\n")
for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks The program consistently freezes after 1006 iterations of the new loop. |
just an idea, try to run the container with see: https://docs.docker.com/config/containers/logging/configure/ |
@skydiablo That looks like it should work, but it doesn't seem to, which casts some doubt on my hypothesis. The documentation for I did another experiment to try and confirm my hypothesis. Again, here's my loop: @@ -538,14 +553,29 @@ def http_get(
return
new_resume_size = resume_size
try:
+ write_debug("Trying to fill buffers...")
+ for i in range (0, 10_000_000):
+ write_debug(f"progress: {i}\n")
+ print("a"*64)
+ print("a"*64, file=sys.stderr)
+ write_debug("before chunk iter\n")
for chunk in r.iter_content(chunk_size=DOWNLOAD_CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
This is the behavior even when Unfortunately, the original issue with an unmodded container does not appear reproducible for me at the present moment. The next time I'm able to reproduce the issue with the original container I'll give these logging options a shot to see if they help. |
i tried this already and in my case this isnt helped :( |
Same for me. Changing the log driver or mode does not seem to have an effect. The only thing I can find that's consistent is that the version where I commented out the progress bar update always works, whereas the download hangs more often than not when I allow the progress bar to update. |
and there is no way to force to show it, like |
for me with -e HF_HUB_ENABLE_HF_TRANSFER="true" at least I can download at faster speeds then the hang occurs with the 2nd o 3rd safetensor file, so i can run the process until all safetensors files are donwloaded, but for models with no shard files you will need a bw of a lot mb/sec |
I submitted huggingface/huggingface_hub#2000 to |
@mssalvatore it seems that you are right. #1486 should fix the issue. |
* Disable tqdm progress bar if no TTY attached When dockerized applications write to STDOUT/STDERR, the applications can block due to logging back pressure (see https://docs.docker.com/config/containers/logging/configure/#configure-the-delivery-mode-of-log-messages-from-container-to-log-driver6 HuggingFace's TGI container is one such example (see huggingface/text-generation-inference#1186). Setting tqdm's `disable=None` will disable the progress bar if no tty is attached and help to resolve TGI's issue #1186. References: huggingface/text-generation-inference#1186 (comment) huggingface/text-generation-inference#1186 (comment) * Disable tqdm progress bar if no TTY attached in lfs.py
Can you guys test |
It's fixed for me. Thank you! |
As 1.4 is out with the fix I will close this issue. |
Met this again with docker compose file:
|
System Info
Information
Tasks
Reproduction
Run the docker script above, and then check the download folder. I can see the first 3 safetensors download successfully, and then it just hangs.
When I switched the model_id to
mistralai/Mistral-7B-Instruct-v0.1
the first tensor gets downloaded and then the process hangs.Expected behavior
I expect this to just work out of the box, since it's using docker, and there is literally nothing for me to screw up ;)
The text was updated successfully, but these errors were encountered: