Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache folders not created when transcribing audio #181

Closed
exengo opened this issue Dec 8, 2024 · 6 comments · Fixed by #190
Closed

Cache folders not created when transcribing audio #181

exengo opened this issue Dec 8, 2024 · 6 comments · Fixed by #190
Labels
bug Something isn't working

Comments

@exengo
Copy link

exengo commented Dec 8, 2024

I use UltraSinger with a model to transcribe Swedish songs, specifically like this:

python UltraSinger.py -i 'some-swedish-song' --whisper_align_model 'KBLab/wav2vec2-large-voxrex-swedish'

It fails with a FileNotFoundError.
Traceback (most recent call last): File "/home/user/Dev/UltraSinger/src/UltraSinger.py", line 693, in <module> main(sys.argv[1:]) File "/home/user/Dev/UltraSinger/src/UltraSinger.py", line 573, in main run() File "/home/user/Dev/UltraSinger/src/UltraSinger.py", line 147, in run TranscribeAudio(process_data) File "/home/user/Dev/UltraSinger/src/UltraSinger.py", line 353, in TranscribeAudio transcription_result = transcribe_audio(process_data.process_data_paths.cache_folder_path, File "/home/user/Dev/UltraSinger/src/UltraSinger.py", line 483, in transcribe_audio with open(transcription_path, "w", encoding=FILE_ENCODING) as file: FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Dev/UltraSinger/src/output/Kent - Ingen kunde röra oss/cache/whisper_large-v3_cuda_KBLab/wav2vec2-large-voxrex-swedish_KBLab/wav2vec2-large-voxrex-swedish_16_None_None.json'

I figured out that the folders were not created and open therefore failed. Adding this on line 483 fixes the issue:
os.makedirs(os.path.dirname(transcription_path), exist_ok=True)

@Calamdor
Copy link

Thanks for this, was trying to figure out why every model I tried was not working!

@agwosdz
Copy link
Contributor

agwosdz commented Dec 14, 2024

Weird, folder creation occurs in procedure CreateProcessAudio, line 142 - " process_data.process_data_paths.processing_audio_path = CreateProcessAudio(process_data)", as defined in CreateProcessAudio - os_helper.create_folder(process_data.process_data_paths.cache_folder_path) , line (423).

I wonder what is happening there. transcription_path is just a .json file in the cache_folder_path (transcription_path = os.path.join(cache_folder_path, f"{transcription_config}.json"))

Was there any other error message prior?

@rakuri255 rakuri255 added the bug Something isn't working label Dec 16, 2024
@rakuri255
Copy link
Owner

Can someone make an PR or give an song link?

@agwosdz
Copy link
Contributor

agwosdz commented Dec 17, 2024

Can someone make an PR or give an song link?

https://www.youtube.com/watch?v=17HIRea5C6Y

@agwosdz
Copy link
Contributor

agwosdz commented Dec 17, 2024

I am pretty sure I know what the problem is. Will create a PR.

Issue is the "/" in the --whisper_align_model option/parameter.

The "/" get's interpreted literally, confusing python about the cache path.

@agwosdz
Copy link
Contributor

agwosdz commented Dec 17, 2024

Workaround, if you are in a hurry is to edit UltraSinger.py at line 466:

def transcribe_audio(cache_folder_path: str, processing_audio_path: str) -> TranscriptionResult:
"""Transcribe audio with AI"""
transcription_result = None
### whisper_align_model_string = None
if settings.transcriber == "whisper":
### if not settings.whisper_align_model is None: whisper_align_model_string = settings.whisper_align_model.replace("/", "_")
### transcription_config = f"{settings.transcriber}{settings.whisper_model.value}{settings.pytorch_device}{whisper_align_model_string}{settings.whisper_batch_size}{settings.whisper_compute_type}{settings.language}"
transcription_path = os.path.join(cache_folder_path, f"{transcription_config}.json")
cached_transcription_available = check_file_exists(transcription_path)
if settings.skip_cache_transcription or not cached_transcription_available:
transcription_result = transcribe_with_whisper(
processing_audio_path,
settings.whisper_model,
settings.pytorch_device,
settings.whisper_align_model,
settings.whisper_batch_size,
settings.whisper_compute_type,
settings.language,
)
with open(transcription_path, "w", encoding=FILE_ENCODING) as file:
file.write(transcription_result.to_json())
else:
print(f"{ULTRASINGER_HEAD} {green_highlighted('cache')} reusing cached transcribed data")
with open(transcription_path) as file:
json = file.read()
transcription_result = TranscriptionResult.from_json(json)
else:
raise NotImplementedError
return transcription_result

Marked and highlighted are the changes:

Essentially, we check if the argument for --whisper_align_model was provided, and if it was, we replace any "/" in the name with "_" in the cache folder path. That way, the setting remains valid for the model, but does not confuse the OS with a "/" in a pathname, making it interpret it as a folder and hence resulting in a file not found error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants