Diarization of larger, longer audio files does not work #281

Seanm-burst · 2024-12-07T01:22:25Z

When transcribing and diarizing larger, longer audio files the job will complete but all of the transcribed text will be attributed to a single speaker. I was able to continously reproduce this issue using two files. Both are .mp3 files generated by extracting the audio from a .mp4 file with VLC Media Player. The first file is 71 minutes long and 122 MB. The second file is 58 minutes long and 54 MB. The audio quality is clear in both files, and there is distinctly more than one speaker. I was able to successfully transcribe and diarize several other smaller files, the largest of which was 15 minutes long.

I am running this locally.
Windows 10
32 GB RAM
AMD Radeon RX 5700 XT
Python v3.11

The logs from an instance of this issue occurring are as follows:

C:\Users\Sean\Documents\Burst\whisper-diarization>python diarize.py -a C:\Users\Sean\Documents\Burst\whisper-diarization\audio_files\674190422_v_134384700710901_100.mp3
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\htdemucs
Separating track C:\Users\Sean\Documents\Burst\whisper-diarization\audio_files\674190422_v_134384700710901_100.mp3
100%|████████████████████████████████████████████████| 4276.349999999999/4276.349999999999 [21:12<00:00,  3.36seconds/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\separate.py", line 228, in <module>
    main()
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\separate.py", line 211, in main
    save_audio(res.pop(args.stem), str(stem), **kwargs)
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\demucs\audio.py", line 261, in save_audio
    ta.save(str(path), wav, sample_rate=samplerate,
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\utils.py", line 313, in save
    return backend.save(
           ^^^^^^^^^^^^^
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\soundfile.py", line 44, in save
    soundfile_backend.save(
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\_backend\soundfile_backend.py", line 457, in save
    soundfile.write(file=filepath, data=src, samplerate=sample_rate, subtype=subtype, format=format)
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 345, in write
    f.write(data)
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1020, in write
    written = self._array_io('write', data, len(data))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1344, in _array_io
    return self._cdata_io(action, cdata, ctype, frames)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1354, in _cdata_io
    _error_check(self._errorcode)
  File "C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\soundfile.py", line 1407, in _error_check
    raise LibsndfileError(err, prefix=prefix)
soundfile.LibsndfileError: System error.
WARNING:root:Source splitting failed, using original audio file. Use --no-stem argument to disable it.
[NeMo W 2024-12-06 18:58:33 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
      warnings.warn(

[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Found existing object C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo.
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Re-using file from: C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo
[NeMo I 2024-12-06 19:15:48 nemo_logging:381] Instantiating model from pre-trained checkpoint
[NeMo W 2024-12-06 19:15:49 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true

[NeMo W 2024-12-06 19:15:49 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
    Validation config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false

[NeMo W 2024-12-06 19:15:49 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

[NeMo I 2024-12-06 19:15:49 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\core\connectors\save_restore_connector.py:608: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
      return torch.load(model_weights, map_location='cpu')

[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Model EncDecDiarLabelModel was successfully restored from C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\diar_msdd_telephonic\3c3697a0a46f945574fa407149975a13\diar_msdd_telephonic.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Loading pretrained vad_multilingual_marblenet model from NGC
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Found existing object C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Re-using file from: C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Instantiating model from pre-trained checkpoint
[NeMo W 2024-12-06 19:15:50 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config :
    manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 256
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    tarred_shard_strategy: scatter
    augmentor:
      shift:
        prob: 0.5
        min_shift_ms: -10.0
        max_shift_ms: 10.0
      white_noise:
        prob: 0.5
        min_level: -90
        max_level: -46
        norm: true
      noise:
        prob: 0.5
        manifest_path: /manifests/noise_0_1_musan_fs.json
        min_snr_db: 0
        max_snr_db: 30
        max_gain_db: 300.0
        norm: true
      gain:
        prob: 0.5
        min_gain_dbfs: -10.0
        max_gain_dbfs: 10.0
        norm: true
    num_workers: 16
    pin_memory: true

[NeMo W 2024-12-06 19:15:50 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
    Validation config :
    manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 256
    shuffle: false
    val_loss_idx: 0
    num_workers: 16
    pin_memory: true

[NeMo W 2024-12-06 19:15:50 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: null
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 128
    shuffle: false
    test_loss_idx: 0

[NeMo I 2024-12-06 19:15:50 nemo_logging:381] PADDING: 16
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Model EncDecClassificationModel was successfully restored from C:\Users\Sean\.cache\torch\NeMo\NeMo_2.0.0rc0\vad_multilingual_marblenet\670f425c7f186060b7a7268ba6dfacb2\vad_multilingual_marblenet.nemo.
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Multiscale Weights: [1, 1, 1, 1, 1]
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Clustering Parameters: {
        "oracle_num_speakers": false,
        "max_num_speakers": 8,
        "enhanced_count_thres": 80,
        "max_rp_threshold": 0.25,
        "sparse_search_volume": 30,
        "maj_vote_spk_count": false,
        "chunk_cluster_count": 50,
        "embeddings_per_chunk": 10000
    }
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Number of files to diarize: 1
[NeMo I 2024-12-06 19:15:50 nemo_logging:381] Split long audio file to avoid CUDA memory issue
splitting manifest: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.77s/it]
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Perform streaming frame-level VAD
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] Dataset loaded with 86 items, total duration of  1.19 hours.
[NeMo I 2024-12-06 19:15:52 nemo_logging:381] # 86 files loaded accounting to # 1 labels
vad:   0%|                                                                                      | 0/86 [00:00<?, ?it/s][NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py:226: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
      with autocast():

[NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\amp\autocast_mode.py:266: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
      warnings.warn(

[NeMo W 2024-12-06 19:15:52 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\parts\preprocessing\features.py:433: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
      with torch.cuda.amp.autocast(enabled=False):

vad: 100%|█████████████████████████████████████████████████████████████████████████████| 86/86 [01:27<00:00,  1.02s/it]
[NeMo I 2024-12-06 19:17:20 nemo_logging:381] Generating predictions with overlapping input segments
[NeMo I 2024-12-06 19:18:06 nemo_logging:381] Converting frame level prediction to speech/no-speech segment in start and end times format.
creating speech segments: 100%|██████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.64s/it]
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Subsegmentation for embedding extraction: scale0, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale0.json
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] Dataset loaded with 2782 items, total duration of  0.43 hours.
[NeMo I 2024-12-06 19:18:14 nemo_logging:381] # 2782 files loaded accounting to # 1 labels
[1/5] extract embeddings:   0%|                                                                 | 0/44 [00:00<?, ?it/s][NeMo W 2024-12-06 19:18:14 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\clustering_diarizer.py:362: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
      with autocast():

[NeMo W 2024-12-06 19:18:14 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\parts\submodules\jasper.py:476: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
      with torch.cuda.amp.autocast(enabled=False):

[1/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 44/44 [01:14<00:00,  1.68s/it]
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Subsegmentation for embedding extraction: scale1, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale1.json
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] Dataset loaded with 2945 items, total duration of  0.45 hours.
[NeMo I 2024-12-06 19:19:28 nemo_logging:381] # 2945 files loaded accounting to # 1 labels
[2/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 47/47 [01:04<00:00,  1.37s/it]
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Subsegmentation for embedding extraction: scale2, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale2.json
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] Dataset loaded with 3194 items, total duration of  0.47 hours.
[NeMo I 2024-12-06 19:20:33 nemo_logging:381] # 3194 files loaded accounting to # 1 labels
[3/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 50/50 [00:59<00:00,  1.19s/it]
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Subsegmentation for embedding extraction: scale3, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale3.json
[NeMo I 2024-12-06 19:21:32 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] Dataset loaded with 3726 items, total duration of  0.51 hours.
[NeMo I 2024-12-06 19:21:33 nemo_logging:381] # 3726 files loaded accounting to # 1 labels
[4/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 59/59 [00:51<00:00,  1.16it/s]
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Subsegmentation for embedding extraction: scale4, C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale4.json
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Extracting embeddings for Diarization
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] Dataset loaded with 5155 items, total duration of  0.57 hours.
[NeMo I 2024-12-06 19:22:24 nemo_logging:381] # 5155 files loaded accounting to # 1 labels
[5/5] extract embeddings: 100%|████████████████████████████████████████████████████████| 81/81 [00:56<00:00,  1.42it/s]
[NeMo I 2024-12-06 19:23:21 nemo_logging:381] Saved embedding files to C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings
[NeMo W 2024-12-06 19:23:21 nemo_logging:393] cuda=False, using CPU for eigen decomposition. This might slow down the clustering process.
clustering: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.94s/it]
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Outputs are saved in C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs directory
[NeMo W 2024-12-06 19:23:27 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:0 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale0_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:1 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale1_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:2 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale2_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:3 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale3_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading embedding pickle file of scale:4 at C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\embeddings\subsegments_scale4_embeddings.pkl
[NeMo I 2024-12-06 19:23:27 nemo_logging:381] Loading cluster label file from C:\Users\Sean\Documents\Burst\whisper-diarization\temp_outputs\speaker_outputs\subsegments_scale4_cluster.label
[NeMo I 2024-12-06 19:23:28 nemo_logging:381] Filtered duration for loading collection is 0.000000.
[NeMo I 2024-12-06 19:23:28 nemo_logging:381] Total 1 session files loaded accounting to # 1 audio clips
  0%|                                                                                            | 0/1 [00:00<?, ?it/s][NeMo W 2024-12-06 19:23:29 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\nemo\collections\asr\models\msdd_models.py:1332: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
      with autocast():

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.53it/s]
[NeMo I 2024-12-06 19:23:29 nemo_logging:381]      [Threshold: 0.7000] [use_clus_as_main=False] [diar_window=50]
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:29 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:29 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:29 nemo_logging:381] Number of files to diarize: 1
[NeMo W 2024-12-06 19:23:30 nemo_logging:393] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-12-06 19:23:30 nemo_logging:381]

[NeMo W 2024-12-06 19:23:30 nemo_logging:393] C:\Users\Sean\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
      warnings.warn(

The video and transcript are interal IP so I'll just share those directly with you Mahmoud via email.

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2024-12-07T09:59:23Z

This is probably related to #147 , I'll investigate soon

Seanm-burst · 2024-12-07T10:35:31Z

Sounds good, I appreciate it.

…

On Saturday, December 7, 2024, Mahmoud Ashraf ***@***.***> wrote: This is probably related to #147 <#147> , I'll investigate soon — Reply to this email directly, view it on GitHub <#281 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMDF2EIISWBMZNUIGVPCDX32ELBJFAVCNFSM6AAAAABTFWHEA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRVGA2TMMRWGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

genewitch · 2025-01-23T02:43:04Z

I also got this same issue with large-v3-turbo with a monaural 16 bit signed .wav file 79:14 long, 409MB mono.

Just to make sure it works otherwise i used a 90 second clip of the same people talking and it did work.
gunna put this on a linux machine stat. thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diarization of larger, longer audio files does not work #281

Diarization of larger, longer audio files does not work #281

Seanm-burst commented Dec 7, 2024

MahmoudAshraf97 commented Dec 7, 2024

Seanm-burst commented Dec 7, 2024 via email

genewitch commented Jan 23, 2025 •

edited

Loading

Diarization of larger, longer audio files does not work #281

Diarization of larger, longer audio files does not work #281

Comments

Seanm-burst commented Dec 7, 2024

MahmoudAshraf97 commented Dec 7, 2024

Seanm-burst commented Dec 7, 2024 via email

genewitch commented Jan 23, 2025 • edited Loading

genewitch commented Jan 23, 2025 •

edited

Loading