Add perplexity loss algorithm #10718

PoojanSmart · 2023-10-20T02:39:28Z

Describe your change:

Perplexity loss function which is used in NLP (Natural language processing) for finding accuracy of the model based on how much certain the model is on its predictions.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

imSanko

Did the code passed the pre commit tests ??

PoojanSmart · 2023-10-20T03:14:05Z

Once #10713 is merged. This issue should get resolved.
@cclauss please review and merge #10713.

One more thing, in my case when I run pre-commit run --all-files --show-diff-on-failure, I am not getting the same types of errors as the log from git actions.

cclauss · 2023-10-20T08:34:09Z

machine_learning/loss_functions/perplexity_loss.py

+    >>> y_pred = np.array(                 \
+        [[[0.28, 0.19, 0.21 , 0.15, 0.15], \
+          [0.24, 0.19, 0.09, 0.18, 0.27]], \
+          [[0.03, 0.26, 0.21, 0.18, 0.30], \
+           [0.28, 0.10, 0.33, 0.15, 0.12]]]\
+    )


PEP8: Backslash line continuation should be avoided in Python.

Suggested change

>>> y_pred = np.array( \

[[[0.28, 0.19, 0.21 , 0.15, 0.15], \

[0.24, 0.19, 0.09, 0.18, 0.27]], \

[[0.03, 0.26, 0.21, 0.18, 0.30], \

[0.28, 0.10, 0.33, 0.15, 0.12]]]\

)

>>> y_pred = np.array(

... [[[0.28, 0.19, 0.21 , 0.15, 0.15],

... [0.24, 0.19, 0.09, 0.18, 0.27]],

... [[0.03, 0.26, 0.21, 0.18, 0.30],

... [0.28, 0.10, 0.33, 0.15, 0.12]]],

... )

cclauss · 2023-10-20T08:39:09Z

pre-commit autoupdate

tianyizheng02

All loss function files were consolidated into machine_learning/loss_functions.py in #10737. Could you move your new code into that file?

tianyizheng02 · 2023-10-23T05:30:02Z

machine_learning/loss_functions.py

+    # Add small constant to avoid getting inf for log(0)
+    epsilon = 1e-7


Please make epsilon an optional function parameter so that users can change its value

tianyizheng02 · 2023-10-23T05:34:30Z

machine_learning/loss_functions.py

+    # Getting the matrix containing prediction for only true class
+    true_class_pred = np.sum(y_pred * filter_matrix, axis=2)
+
+    # Calculating perplexity for each sentence
+    perp_losses = np.exp(
+        np.negative(np.mean(np.log(true_class_pred + epsilon), axis=1))
+    )


Suggested change

# Getting the matrix containing prediction for only true class

true_class_pred = np.sum(y_pred * filter_matrix, axis=2)

# Calculating perplexity for each sentence

perp_losses = np.exp(

np.negative(np.mean(np.log(true_class_pred + epsilon), axis=1))

)

# Getting the matrix containing prediction for only true class

# Clip values to avoid log(0)

true_class_pred = np.sum(y_pred * filter_matrix, axis=2).clip(epsilon, 1)

# Calculating perplexity for each sentence

perp_losses = np.exp(np.negative(np.mean(np.log(true_class_pred), axis=1)))

You can use .clip() to restrict the range of the array's values instead of adding epsilon to every entry. This way, only the problematic values get changed while OK values remain the same. Note that this may change the value of your doctests.

algorithms-keeper bot added awaiting reviews This PR is ready to be reviewed tests are failing Do not merge until tests pass labels Oct 20, 2023

imSanko reviewed Oct 20, 2023

View reviewed changes

This comment was marked as outdated.

Sign in to view

algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Oct 20, 2023

cclauss reviewed Oct 20, 2023

View reviewed changes

cclauss self-assigned this Oct 20, 2023

cclauss changed the title ~~Adds perplexity loss algorithm~~ Add perplexity loss algorithm Oct 20, 2023

algorithms-keeper bot added tests are failing Do not merge until tests pass and removed tests are failing Do not merge until tests pass labels Oct 20, 2023

PoojanSmart requested a review from cclauss October 22, 2023 10:29

tianyizheng02 requested changes Oct 22, 2023

View reviewed changes

tianyizheng02 requested changes Oct 23, 2023

View reviewed changes

PoojanSmart closed this Oct 27, 2023

PoojanSmart force-pushed the master branch from 377b789 to fe4aad0 Compare October 27, 2023 02:51

PoojanSmart mentioned this pull request Oct 27, 2023

Add perplexity loss algorithm #11028

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add perplexity loss algorithm #10718

Add perplexity loss algorithm #10718

Uh oh!

PoojanSmart commented Oct 20, 2023

Uh oh!

imSanko left a comment

Uh oh!

This comment was marked as outdated.

PoojanSmart commented Oct 20, 2023

Uh oh!

cclauss Oct 20, 2023

Uh oh!

cclauss commented Oct 20, 2023

Uh oh!

tianyizheng02 left a comment

Uh oh!

tianyizheng02 Oct 23, 2023

Uh oh!

tianyizheng02 Oct 23, 2023

Uh oh!

Uh oh!

		# Add small constant to avoid getting inf for log(0)
		epsilon = 1e-7

Uh oh!

Add perplexity loss algorithm #10718

Add perplexity loss algorithm #10718

Uh oh!

Conversation

PoojanSmart commented Oct 20, 2023

Describe your change:

Checklist:

Uh oh!

imSanko left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

PoojanSmart commented Oct 20, 2023

Uh oh!

cclauss Oct 20, 2023

Choose a reason for hiding this comment

Uh oh!

cclauss commented Oct 20, 2023

Uh oh!

tianyizheng02 left a comment

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

tianyizheng02 Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!