You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When transcribing and diarizing larger, longer audio files the job will complete but all of the transcribed text will be attributed to a single speaker. I was able to continously reproduce this issue using two files. Both are .mp3 files generated by extracting the audio from a .mp4 file with VLC Media Player. The first file is 71 minutes long and 122 MB. The second file is 58 minutes long and 54 MB. The audio quality is clear in both files, and there is distinctly more than one speaker. I was able to successfully transcribe and diarize several other smaller files, the largest of which was 15 minutes long.
I am running this locally.
Windows 10
32 GB RAM
AMD Radeon RX 5700 XT
Python v3.11
The logs from an instance of this issue occurring are as follows:
C:\Users\Sean\Documents\Burst\whisper-diarization>python diarize.py -a C:\Users\Sean\Documents\Burst\whisper-diarization\audio_files\674190422_v_134384700710901_100.mp3
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\htdemucs
Separating track C:\Users\Sean\Documents\Burst\whisper-diarization\audio_files\674190422_v_134384700710901_100.mp3
100%|████████████████████████████████████████████████| 4276.349999999999/4276.349999999999 [21:12<00:00, 3.36seconds/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\separate.py", line 228, in <module>
main()
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\separate.py", line 211, in main
save_audio(res.pop(args.stem), str(stem), **kwargs)
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\audio.py", line 261, in save_audio
ta.save(str(path), wav, sample_rate=samplerate,
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\utils.py", line 313, in save
return backend.save(
^^^^^^^^^^^^^
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\soundfile.py", line 44, in save
soundfile_backend.save(
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\soundfile_backend.py", line 457, in save
soundfile.write(file=filepath, data=src, samplerate=sample_rate, subtype=subtype, format=format)
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 345, in write
f.write(data)
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1020, in write
written = self._array_io('write', data, len(data))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1344, in _array_io
return self._cdata_io(action, cdata, ctype, frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1354, in _cdata_io
_error_check(self._errorcode)
File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1407, in _error_check
raise LibsndfileError(err, prefix=prefix)
soundfile.LibsndfileError: System error.
WARNING:root:Source splitting failed, using original audio file. Use --no-stem argument to disable it.
[NeMo W 2024-12-06 18:58:33 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Found existing object C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo.
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Re-using file from: C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Instantiating model from pre-trained checkpoint
[NeMo W 2024-12-06 19:15:49 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: true
[NeMo W 2024-12-06 19:15:49 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
[NeMo W 2024-12-06 19:15:49 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
seq_eval_mode: false
[NeMo I 2024-12-06 19:15:49 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\core\connectors\save_restore_connector.py:608: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return torch.load(model_weights, map_location='cpu')
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Model EncDecDiarLabelModel was successfully restored from C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Loading pretrained vad_multilingual_marblenet model from NGC
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Found existing object C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Re-using file from: C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Instantiating model from pre-trained checkpoint
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: true
is_tarred: false
tarred_audio_filepaths: null
tarred_shard_strategy: scatter
augmentor:
shift:
prob: 0.5
min_shift_ms: -10.0
max_shift_ms: 10.0
white_noise:
prob: 0.5
min_level: -90
max_level: -46
norm: true
noise:
prob: 0.5
manifest_path: /manifests/noise_0_1_musan_fs.json
min_snr_db: 0
max_snr_db: 30
max_gain_db: 300.0
norm: true
gain:
prob: 0.5
min_gain_dbfs: -10.0
max_gain_dbfs: 10.0
norm: true
num_workers: 16
pin_memory: true
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: false
val_loss_idx: 0
num_workers: 16
pin_memory: true
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
labels:
- background
- speech
batch_size: 128
shuffle: false
test_loss_idx: 0
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Model EncDecClassificationModel was successfully restored from C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Multiscale Weights: [1, 1, 1, 1, 1]
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Clustering Parameters: {
"oracle_num_speakers": false,
"max_num_speakers": 8,
"enhanced_count_thres": 80,
"max_rp_threshold": 0.25,
"sparse_search_volume": 30,
"maj_vote_spk_count": false,
"chunk_cluster_count": 50,
"embeddings_per_chunk": 10000
}
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Number of files to diarize: 1
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Split long audio file to avoid CUDA memory issue
splitting manifest: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.77s/it]
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Perform streaming frame-level VAD
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Dataset loaded with 86 items, total duration of 1.19 hours.
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] # 86 files loaded accounting to # 1 labels
vad: 0%| | 0/86 [00:00<?, ?it/s][NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py:226: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with autocast():
[NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\amp\autocast_mode.py:266: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
[NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\parts\preprocessing\features.py:433: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
vad: 100%|█████████████████████████████████████████████████████████████████████████████| 86/86 [01:27<00:00, 1.02s/it]
[NeMo I 2024-12-06 19:17:20 nemo_logging:381] Generating predictions with overlapping input segments
[NeMo I 2024-12-06 19:18:06 nemo_logging:381] Converting frame level prediction to speech/no-speech segment in start and end times format.
creating speech segments: 100%|██████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.64s/it]
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Subsegmentation for embedding extraction: scale0, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale0.json
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Dataset loaded with 2782 items, total duration of 0.43 hours.
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] # 2782 files loaded accounting to # 1 labels
[1/5] extract embeddings: 0%| | 0/44 [00:00<?, ?it/s][NeMo W 2024-12-06 19:18:14 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py:362: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with autocast():
[NeMo W 2024-12-06 19:18:14 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\parts\submodules\jasper.py:476: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
[1/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 44/44 [01:14<00:00, 1.68s/it]
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Subsegmentation for embedding extraction: scale1, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale1.json
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Dataset loaded with 2945 items, total duration of 0.45 hours.
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] # 2945 files loaded accounting to # 1 labels
[2/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 47/47 [01:04<00:00, 1.37s/it]
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Subsegmentation for embedding extraction: scale2, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale2.json
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Dataset loaded with 3194 items, total duration of 0.47 hours.
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] # 3194 files loaded accounting to # 1 labels
[3/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 50/50 [00:59<00:00, 1.19s/it]
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Subsegmentation for embedding extraction: scale3, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale3.json
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] Dataset loaded with 3726 items, total duration of 0.51 hours.
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] # 3726 files loaded accounting to # 1 labels
[4/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 59/59 [00:51<00:00, 1.16it/s]
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Subsegmentation for embedding extraction: scale4, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale4.json
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Dataset loaded with 5155 items, total duration of 0.57 hours.
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] # 5155 files loaded accounting to # 1 labels
[5/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 81/81 [00:56<00:00, 1.42it/s]
[NeMo I 2024-12-06 19:23:21 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo W 2024-12-06 19:23:21 nemo_logging:393] cuda=False, using CPU for eigen decomposition. This might slow down the clustering process.
clustering: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.94s/it]
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Outputs are saved in C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs directory
[NeMo W 2024-12-06 19:23:27 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:0 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale0_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:1 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale1_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:2 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale2_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:3 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale3_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:4 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale4_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading cluster label file from C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale4_cluster.label
[NeMo I 2024-12-06 19:23:28 nemo_logging:381] Filtered duration for loading collection is 0.000000.
[NeMo I 2024-12-06 19:23:28 nemo_logging:381] Total 1 session files loaded accounting to # 1 audio clips
0%| | 0/1 [00:00<?, ?it/s][NeMo W 2024-12-06 19:23:29 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\msdd_models.py:1332: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with autocast():
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.53it/s]
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] [Threshold: 0.7000] [use_clus_as_main=False] [diar_window=50]
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:29 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:29 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:30 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:30 nemo_logging:381]
[NeMo W 2024-12-06 19:23:30 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
The video and transcript are interal IP so I'll just share those directly with you Mahmoud via email.
The text was updated successfully, but these errors were encountered:
I also got this same issue with large-v3-turbo with a monaural 16 bit signed .wav file 79:14 long, 409MB mono.
Just to make sure it works otherwise i used a 90 second clip of the same people talking and it did work.
gunna put this on a linux machine stat. thank you!
When transcribing and diarizing larger, longer audio files the job will complete but all of the transcribed text will be attributed to a single speaker. I was able to continously reproduce this issue using two files. Both are .mp3 files generated by extracting the audio from a .mp4 file with VLC Media Player. The first file is 71 minutes long and 122 MB. The second file is 58 minutes long and 54 MB. The audio quality is clear in both files, and there is distinctly more than one speaker. I was able to successfully transcribe and diarize several other smaller files, the largest of which was 15 minutes long.
I am running this locally.
Windows 10
32 GB RAM
AMD Radeon RX 5700 XT
Python v3.11
The logs from an instance of this issue occurring are as follows:
The video and transcript are interal IP so I'll just share those directly with you Mahmoud via email.
The text was updated successfully, but these errors were encountered: