Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos in confidence tutorial notebooks #7581

Merged
merged 2 commits into from
Oct 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions tutorials/asr/ASR_Confidence_Estimation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@
"1. Initialize _ConfidenceConfig_\n",
"2. Put the created _ConfidenceConfig_ into the model decoding config.\n",
"\n",
"The folloving cell contains an example of _ConfidenceConfig_ initialization and updating the the model's decoding config.\n",
"The following cell contains an example of _ConfidenceConfig_ initialization and updating the model's decoding config.\n",
"\n",
"For the _ConfidenceConfig_ there are also listed possible values for its parameters.\n",
"\n",
Expand Down Expand Up @@ -627,14 +627,14 @@
"4. Normalized Cross Entropy ($\\mathrm{NCE}$): how close of confidence for correct predictions to $1.0$ and of incorrect predictions to $0.0$. It ranges from $-\\infty$ to $1.0$, with negative scores indicating that the confidence method performs worse than the setting confidence score to $1-\\mathrm{WER}$. This metric is also known as Normalized Mutual Information.\n",
"5. Expected Calibration Error ($\\mathrm{ECE}$): a weighted average over the absolute accuracy/confidence difference. It ranges from $0.0$ to $1.0$ with the best value $0.0$.\n",
"\n",
"Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be condsidered. They are:\n",
"Metrics based on the Youden's curve (see https://en.wikipedia.org/wiki/Youden%27s_J_statistic) can also be considered. They are:\n",
"1. Area Under the Youden's curve ($\\mathrm{AUC}_\\mathrm{YC}$): the rate of the effective threshold range (i.e. the adjustability or responsiveness). It ranges from $0.0$ to $1.0$ with the best value $0.5$.\n",
"2. Maximum of the Youden's curve $\\mathrm{MAX}_\\mathrm{YC}$: the optimal $\\mathrm{TNR}$ vs. $\\mathrm{FNR}$ tradeoff. It's unnormalized version can be used as a criterion for selecting the optimal $\\tau$. It ranges from $0.0$ to $1.0$ with the best value $1.0$.\n",
"3. The standard deviation of the Youden's curve values ($\\mathrm{STD}_\\mathrm{YC}$): indicates that $\\mathrm{TNR}$ and $\\mathrm{FNR}$ increase at different rates (viz. $\\mathrm{TNR}$ grows faster) as the $\\tau$ increases. It ranges from $0.0$ to $0.5$ with the best value around $0.25$.\n",
"\n",
"When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main mectic of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n",
"When selecting/tuning a confidence method, it is recommended to maximize $\\mathrm{AUC}_\\mathrm{ROC}$ first as this is the main metric of confidence estimation quality. Then, for overconfident models, maximizing $\\mathrm{AUC}_\\mathrm{NT}$ should take precedence over $\\mathrm{AUC}_\\mathrm{PR}$. Finally, a trade-off between $\\mathrm{NCE}$/$\\mathrm{ECE}$ and the family of $\\mathrm{YC}$ metrics considered as a compromise between formal correctness and controllability.\n",
"\n",
"Let's see how well our confidence performs according to the metrcis above."
"Let's see how well our confidence performs according to the metrics above."
]
},
{
Expand Down Expand Up @@ -891,7 +891,7 @@
"id": "dbb82877"
},
"source": [
"## 4.1. Small WER improvenent\n",
"## 4.1. Small WER improvement\n",
"\n",
"Good confidence scores can slightly reduce WER by removing low confidence words from recognition results.\n",
"\n",
Expand Down Expand Up @@ -1190,7 +1190,7 @@
"id": "f28da61f",
"metadata": {},
"source": [
"The original examples contain speech, music, or noise. The resulring audio recordings are considered to contain no recognizable speech.\n",
"The original examples contain speech, music, or noise. The resulting audio recordings are considered to contain no recognizable speech.\n",
"\n",
"You can listen to an example of the audios."
]
Expand Down Expand Up @@ -1397,7 +1397,7 @@
},
"source": [
"# Summary\n",
"This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallusinations removal.\n",
"This tutorial covered the basics of ASR confidence estimation and two examples of using ASR word confidence: WER reduction and hallucinations removal.\n",
"\n",
"You can follow this tutorial on [ASR Confidence-based Ensembles](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Confidence_Ensembles.ipynb) to see another important application of ASR confidence estimation."
]
Expand Down
6 changes: 3 additions & 3 deletions tutorials/asr/Confidence_Ensembles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"\n",
"# clone SDP and install requirements\n",
"!git clone https://github.com/NVIDIA/NeMo-speech-data-processor $WORKSPACE_DIR/NeMo-speech-data-processor\n",
"!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements.txt\n",
"!pip install -r $WORKSPACE_DIR/NeMo-speech-data-processor/requirements/main.txt\n",
"\n",
"\"\"\"\n",
"Remember to restart the runtime for the kernel to pick up any upgraded packages.\n",
Expand Down Expand Up @@ -106,13 +106,13 @@
"\n",
"A short answer — you can use any ASR models. E.g., you can combine a number of CTC models, or Transducer models, or even mix-and-match. \n",
"\n",
"A more detailed answer is that hte performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n",
"A more detailed answer is that the performance of the confidence ensemble is upper-bounded by the performance of the best model on each of the input examples. Thus you will benefit if some of your models work really well on part of the input compared to other models. This way you will get more gains compared to each separate model, and it will also make correct model identification easier.\n",
"\n",
"### How to estimate a model's confidence?\n",
"\n",
"Good news, we have a whole separate [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_Confidence_Estimation.ipynb) on this topic! You can go through it if you want to know all the details about different ways to estimate confidence of NeMo ASR models. There are different confidence measures and aggregation functions and for the absolute best performance, you will need to run a grid-search to pick the best confidence estimation way for your specific models and data.\n",
"\n",
"That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datsets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n",
"That being said, we found that there exist a set of confidence parameters that work pretty well on a large set of models and datasets. They are default in NeMo and so you might not need to worry about running the search. If you do want to maximize the performance by tuning the confidence parameters, you only need to add [a few extra config lines](#Building-and-evaluating-ensemble-(tuned-parameters)).\n",
"\n",
"### How to calibrate confidence values?\n",
"\n",
Expand Down
Loading