NVIDIA · nithinraok · Oct 2, 2023 · Sep 30, 2023 · Sep 30, 2023
diff --git a/tutorials/asr/ASR_Confidence_Estimation.ipynb b/tutorials/asr/ASR_Confidence_Estimation.ipynb
@@ -422,7 +422,7 @@
     "1. Initialize _ConfidenceConfig_\n",
     "2. Put the created _ConfidenceConfig_ into the model decoding config.\n",
     "\n",
-    "The folloving cell contains an example of _ConfidenceConfig_ initialization and updating the the model's decoding config.\n",
+    "The following cell contains an example of _ConfidenceConfig_ initialization and updating the model's decoding config.\n",
     "\n",
     "For the _ConfidenceConfig_ there are also listed possible values for its parameters.\n",
     "\n",
@@ -627,14 +627,14 @@
     "4. Normalized Cross Entropy ($\\mathrm{NCE}$): how close of confidence for correct predictions to $1.0$ and of incorrect predictions to $0.0$. It ranges from $-\\infty$ to $1.0$, with negative scores indicating that the conﬁdence method performs worse than the setting confidence score to $1-\\mathrm{WER}$. This metric is also known as Normalized Mutual Information.\n",
     "5. Expected Calibration Error ($\\mathrm{ECE}$): a weighted average over the absolute accuracy/confidence difference. It ranges from $0.0$ to $1.0$ with the best value $0.0$.\n",
     "\n",
-    "Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be condsidered. They are:\n",
+    "Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be considered. They are:\n",
     "1. Area Under the Youden's curve ($\\mathrm{AUC}_\\mathrm{YC}$): the rate of the effective threshold range (i.e. the adjustability or responsiveness). It ranges from $0.0$ to $1.0$ with the best value $0.5$.\n",
     "2. Maximum of the Youden's curve $\\mathrm{MAX}_\\mathrm{YC}$: the optimal $\\mathrm{TNR}$ vs. $\\mathrm{FNR}$ tradeoff. It's unnormalized version can be used as a criterion for selecting the optimal $\\tau$. It ranges from $0.0$ to $1.0$ with the best value $1.0$.\n",
     "3. The standard deviation of the Youden's curve values ($\\mathrm{STD}_\\mathrm{YC}$): indicates that $\\mathrm{TNR}$ and $\\mathrm{FNR}$ increase at different rates (viz. $\\mathrm{TNR}$ grows faster) as the $\\tau$ increases. It ranges from $0.0$ to $0.5$ with the best value around $0.25$.\n",
     "\n",
-    "When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main mectic of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n",
+    "When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main metric of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n",
     "\n",
-    "Let's see how well our confidence performs according to the metrcis above."
+    "Let's see how well our confidence performs according to the metrics above."
    ]
   },
   {
@@ -891,7 +891,7 @@
     "id": "dbb82877"
    },
    "source": [
-    "## 4.1. Small WER improvenent\n",
+    "## 4.1. Small WER improvement\n",
     "\n",
     "Good confidence scores can slightly reduce WER by removing low confidence words from recognition results.\n",
     "\n",
@@ -1190,7 +1190,7 @@
    "id": "f28da61f",
    "metadata": {},
    "source": [
-    "The original examples contain speech, music, or noise. The resulring audio recordings are considered to contain no recognizable speech.\n",
+    "The original examples contain speech, music, or noise. The resulting audio recordings are considered to contain no recognizable speech.\n",
     "\n",
     "You can listen to an example of the audios."
    ]
@@ -1397,7 +1397,7 @@
    },
    "source": [
     "# Summary\n",
-    "This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallusinations removal.\n",
+    "This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallucinations removal.\n",
     "\n",
     "You can follow this tutorial on [ASR Confidence-based Ensembles](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb) to see another important application of ASR confidence estimation."
    ]

diff --git a/tutorials/asr/Confidence_Ensembles.ipynb b/tutorials/asr/Confidence_Ensembles.ipynb
@@ -48,7 +48,7 @@
     "\n",
     "# clone SDP and install requirements\n",
     "!git clone https://github.com/NVIDIA/NeMo-speech-data-processor $WORKSPACE_DIR/NeMo-speech-data-processor\n",
-    "!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements.txt\n",
+    "!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements/main.txt\n",
     "\n",
     "\"\"\"\n",
     "Remember to restart the runtime for the kernel to pick up any upgraded packages.\n",
@@ -106,13 +106,13 @@
     "\n",
     "A short answer — you can use any ASR models. E.g., you can combine a number of CTC models, or Transducer models, or even mix-and-match. \n",
     "\n",
-    "A more detailed answer is that hte performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n",
+    "A more detailed answer is that the performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n",
     "\n",
     "### How to estimate a model's confidence?\n",
     "\n",
     "Good news, we have a whole separate [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_Confidence_Estimation.ipynb) on this topic! You can go through it if you want to know all the details about different ways to estimate confidence of NeMo ASR models. There are different confidence measures and aggregation functions and for the absolute best performance, you will need to run a grid-search to pick the best confidence estimation way for your specific models and data.\n",
     "\n",
-    "That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datsets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n",
+    "That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datasets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n",
     "\n",
     "### How to calibrate confidence values?\n",
     "\n",