From 8997ba2d248d63f3884ec237d471beef8d18c8b2 Mon Sep 17 00:00:00 2001 From: monica-sekoyan <166123533+monica-sekoyan@users.noreply.github.com> Date: Thu, 7 Nov 2024 19:43:50 +0400 Subject: [PATCH] download from BRANCH (#11206) Signed-off-by: monica-sekoyan --- .../asr/Self_Supervised_Pre_Training.ipynb | 38 ++++++++++--------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/tutorials/asr/Self_Supervised_Pre_Training.ipynb b/tutorials/asr/Self_Supervised_Pre_Training.ipynb index 91709d230c2c..18861759237b 100644 --- a/tutorials/asr/Self_Supervised_Pre_Training.ipynb +++ b/tutorials/asr/Self_Supervised_Pre_Training.ipynb @@ -17,7 +17,9 @@ "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n", "4. Run this cell to set up dependencies.\n", "5. Restart the runtime (Runtime -> Restart Runtime) for any upgraded packages to take effect\n", - "\n\nNOTE: User is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.\n", + "\n", + "\n", + "NOTE: User is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.\n", "\"\"\"\n", "# If you're using Google Colab and not running locally, run this cell.\n", "\n", @@ -272,8 +274,8 @@ "source": [ "## Grab the configs we'll use in this example\n", "!mkdir configs\n", - "!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/asr/conf/ssl/citrinet/citrinet_ssl_1024.yaml\n", - "!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/asr/conf/citrinet/citrinet_1024.yaml\n" + "!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/asr/conf/ssl/citrinet/citrinet_ssl_1024.yaml\n", + "!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/asr/conf/citrinet/citrinet_1024.yaml\n" ] }, { @@ -380,16 +382,16 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "4JnepitBZ3ta" + }, "source": [ "Note that for this loss the outputs must match the inputs, so since we are using Citrinet architecture with 8x stride, we would need to either set \"combine_time_steps\" to 8, or put additional stride layers in the decoder. By default for Citrinet with 8x stride we use \"combine_time_steps=4\" and \"stride_layers=1\" to match the 8x stride.\n", "\n", "Since in MaskedPatchAugmentation we set mask_patches to 0.5 and our min_durations are set to 3.2, we are guaranteed to have 1.6 masked second per utterance, or 160 masked steps. Since combine_time_steps is set to 4, this means that 160 / 4 = 40 total negatives can be sampled, so we set num_negatives to 40 (unless you set sample_from_same_utterance_only to false or sample_from_non_masked to true, but this tends to make results worse).\n", "\n", "In the default configs we assume that min_duration for samples is higher (8 seconds by default), so there we can set patch_size to 48 for a total of 480 masked steps, and use 100 sampled negatives. If the min_duration of samples that you are training on allows, the amount of masked steps as well as negatives can be increased further (masking around 50% of the sample duration tends to work well)." - ], - "metadata": { - "id": "4JnepitBZ3ta" - } + ] }, { "cell_type": "markdown", @@ -482,7 +484,7 @@ "outputs": [], "source": [ "!mkdir scripts\n", - "!wget -P scripts/ https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tokenizers/process_asr_text_tokenizer.py\n", + "!wget -P scripts/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/tokenizers/process_asr_text_tokenizer.py\n", "\n", "!python ./scripts/process_asr_text_tokenizer.py \\\n", " --manifest=\"{data_dir}/an4/train_manifest.json\" \\\n", @@ -650,23 +652,23 @@ }, { "cell_type": "markdown", - "source": [ - "We can optionally freeze the encoder and only fine-tune the decoder during training. This can be done to lower the memory and time requirements of fine-tuning, but will likely result in a higher word error rate." - ], "metadata": { "id": "S5aVb2F8WuAR" - } + }, + "source": [ + "We can optionally freeze the encoder and only fine-tune the decoder during training. This can be done to lower the memory and time requirements of fine-tuning, but will likely result in a higher word error rate." + ] }, { "cell_type": "code", - "source": [ - "#asr_model.encoder.freeze()" - ], + "execution_count": null, "metadata": { "id": "LpF_YQUmXUR8" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "#asr_model.encoder.freeze()" + ] }, { "cell_type": "markdown", @@ -711,7 +713,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.7" + "version": "3.10.12" }, "pycharm": { "stem_cell": {