Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process doesn't exit / hangs at the end on Windows #64

Open
thewh1teagle opened this issue Jun 20, 2024 · 10 comments
Open

Process doesn't exit / hangs at the end on Windows #64

thewh1teagle opened this issue Jun 20, 2024 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@thewh1teagle
Copy link
Contributor

thewh1teagle commented Jun 20, 2024

I tried examples/nllb.rs and it works. but when the translate finish, the process hangs for few minutes, and ctrl + c doesn't exit it.
Is there some cleanup? can we speed it up?

And another question:
Is there a place I can download the needed folder for translate that will work for mac / windows / linux? I use facebook/nllb-200-distilled-600M
I tried to zip the folder created on mac and use it in windows but it didn't worked. I had to use transformers to create it on each platform.
The goal is to have something I can use right away for cross platform desktop app for offline translations.

Thanks for this amazing library!

Update

Looks like it hangs on drop(t) where t is the translator instance

@jkawamoto
Copy link
Owner

Thank you for reporting this issue. I was able to reproduce it and will investigate what is blocking the termination.

Regarding model files, I am using Hugging Face and have uploaded some models for CTranslate2. You can create an account and upload your model files there.

This code snippet downloads the model files and returns the directory path:

let api = hf_hub::api::sync::Api::new()?;
let repo = api.model("<your account name>/<repo>");

let mut res = None;
for f in repo.info()?.siblings {
    let path = repo.get(&f.rfilename)?;
    if res.is_none() {
        res = path.parent().map(PathBuf::from);
    }
}

// path to the directory that contains the model file
res.ok_or_else(|| anyhow!("no model files are found")) 

@thewh1teagle
Copy link
Contributor Author

Thank you for reporting this issue. I was able to reproduce it and will investigate what is blocking the termination.

Let me know if you have any ideas about where it might be. I tried to debug it but couldn't find a good way to do so in Windows. If you find a way, I'd appreciate it if you could share your insights.

Regarding model files, I am using Hugging Face and have uploaded some models for CTranslate2. You can create an account and upload your model files there.

This code snippet downloads the model files and returns the directory path:

Thanks for the code. eventually I created custom downloader since I needed to get progress callbacks

@jkawamoto
Copy link
Owner

It looks like the join method here blocks forever:

https://github.com/OpenNMT/CTranslate2/blob/master/src/thread_pool.cc#L106

However, I’m not sure why this happens, since the worker threads appear to end correctly.

@thewh1teagle
Copy link
Contributor Author

thewh1teagle commented Jul 18, 2024

I've identified the root cause of the issue. As I initially suspected, cxx or the FFI wasn't freeing the model class properly at the endend (It doesn't call the destructor). Typically, I implement Drop for FFI objects using bindgen. While I'm not entirely sure how to do this with cxx, adding the following snippet at the end of the Whisper example resolved the issue:

    unsafe {
        std::ptr::drop_in_place(model.ptr.into_raw());
    }

Make sure to expose the ptr in whisper.rs like:

pub struct Whisper {
    model: OsString,
    pub ptr: UniquePtr<ffi::Whisper>,
}

@jkawamoto
Copy link
Owner

Thank you for investigating this issue. Your suggestion works for me and resolves the hang, although bypassing the release of UniquePtr might fail to release resources. I’m wondering if your VRAM is being freed correctly.

I opened PR #74 and will merge it if there appear to be no resource leaks.

@thewh1teagle
Copy link
Contributor Author

thewh1teagle commented Jul 19, 2024

After checking it again I think that it doesn't release the memory.
As you said the join() of the thread hangs although the worker getting into the last line (and maybe truly return the function and exit)
I suspect that once the worker almost returning, the destructors of the objects in the worker invoked and some of them hanging for some reason.

@jkawamoto
Copy link
Owner

jkawamoto commented Jul 20, 2024

@jkawamoto jkawamoto added the help wanted Extra attention is needed label Jul 22, 2024
@jkawamoto
Copy link
Owner

With the workaround implemented in #74, VRAM/RAM are not released even though Translator/Generator/Whisper are dropped. However, the RAM is released when the main process is terminated. I think this is still better than having the process blocked forever. So, I’ll merge the PR.

@jkawamoto jkawamoto mentioned this issue Oct 12, 2024
@thewh1teagle
Copy link
Contributor Author

Hi @jkawamoto

Any news about that issue?

@jkawamoto
Copy link
Owner

If I use CUDA, the process will not get blocked for me. But if I use CPU, it still gets blocked.

It’s still a mystery why joining threads gets blocked in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants