Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add perplexity loss algorithm #10718

Closed
wants to merge 0 commits into from

Conversation

PoojanSmart
Copy link
Contributor

Describe your change:

  • Perplexity loss function which is used in NLP (Natural language processing) for finding accuracy of the model based on how much certain the model is on its predictions.
  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

@algorithms-keeper algorithms-keeper bot added awaiting reviews This PR is ready to be reviewed tests are failing Do not merge until tests pass labels Oct 20, 2023
Copy link
Contributor

@imSanko imSanko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the code passed the pre commit tests ??

@PoojanSmart

This comment was marked as outdated.

@PoojanSmart
Copy link
Contributor Author

Once #10713 is merged. This issue should get resolved.
@cclauss please review and merge #10713.

One more thing, in my case when I run pre-commit run --all-files --show-diff-on-failure, I am not getting the same types of errors as the log from git actions.

@algorithms-keeper algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Oct 20, 2023
Comment on lines 32 to 37
>>> y_pred = np.array( \
[[[0.28, 0.19, 0.21 , 0.15, 0.15], \
[0.24, 0.19, 0.09, 0.18, 0.27]], \
[[0.03, 0.26, 0.21, 0.18, 0.30], \
[0.28, 0.10, 0.33, 0.15, 0.12]]]\
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: Backslash line continuation should be avoided in Python.

Suggested change
>>> y_pred = np.array( \
[[[0.28, 0.19, 0.21 , 0.15, 0.15], \
[0.24, 0.19, 0.09, 0.18, 0.27]], \
[[0.03, 0.26, 0.21, 0.18, 0.30], \
[0.28, 0.10, 0.33, 0.15, 0.12]]]\
)
>>> y_pred = np.array(
... [[[0.28, 0.19, 0.21 , 0.15, 0.15],
... [0.24, 0.19, 0.09, 0.18, 0.27]],
... [[0.03, 0.26, 0.21, 0.18, 0.30],
... [0.28, 0.10, 0.33, 0.15, 0.12]]],
... )

@cclauss cclauss self-assigned this Oct 20, 2023
@cclauss
Copy link
Member

cclauss commented Oct 20, 2023

pre-commit autoupdate

@cclauss cclauss changed the title Adds perplexity loss algorithm Add perplexity loss algorithm Oct 20, 2023
@algorithms-keeper algorithms-keeper bot added tests are failing Do not merge until tests pass and removed tests are failing Do not merge until tests pass labels Oct 20, 2023
@PoojanSmart PoojanSmart requested a review from cclauss October 22, 2023 10:29
Copy link
Contributor

@tianyizheng02 tianyizheng02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All loss function files were consolidated into machine_learning/loss_functions.py in #10737. Could you move your new code into that file?

Comment on lines 315 to 316
# Add small constant to avoid getting inf for log(0)
epsilon = 1e-7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make epsilon an optional function parameter so that users can change its value

Comment on lines 332 to 338
# Getting the matrix containing prediction for only true class
true_class_pred = np.sum(y_pred * filter_matrix, axis=2)

# Calculating perplexity for each sentence
perp_losses = np.exp(
np.negative(np.mean(np.log(true_class_pred + epsilon), axis=1))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Getting the matrix containing prediction for only true class
true_class_pred = np.sum(y_pred * filter_matrix, axis=2)
# Calculating perplexity for each sentence
perp_losses = np.exp(
np.negative(np.mean(np.log(true_class_pred + epsilon), axis=1))
)
# Getting the matrix containing prediction for only true class
# Clip values to avoid log(0)
true_class_pred = np.sum(y_pred * filter_matrix, axis=2).clip(epsilon, 1)
# Calculating perplexity for each sentence
perp_losses = np.exp(np.negative(np.mean(np.log(true_class_pred), axis=1)))

You can use .clip() to restrict the range of the array's values instead of adding epsilon to every entry. This way, only the problematic values get changed while OK values remain the same. Note that this may change the value of your doctests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting reviews This PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants