Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Issue #266 #267

Merged
merged 2 commits into from
Nov 15, 2022
Merged

Fix for Issue #266 #267

merged 2 commits into from
Nov 15, 2022

Conversation

Max-Bladen
Copy link
Collaborator

While the error that resulted in this PR was raised by auroc(), the issue stems from the predict() function.

Lack of more explicit warning against near zero variance features in block.splsda() will be address in separate PR.

For framework presented in reprex in associated GitHub Issue (here).

Take a given feature in one of the predictor blocks. If it's all 0s:

  • Centered and scaled in block.splsda(). Results in same all zero vector as center = 0, scale = 0.
  • object$X used as newdata parameter for predict() call in auroc()
  • within predict(), 0 values have 0 subtracted from them (centered) are divided by 0 (scaling), resulting in NaN in those predictor values. (In R, 0/0 == NaN)
  • NaNs can be handled safely (ignored) by the remainder of the function, resulting in valid predictions.

If that feature are all the same non-zero value (eg. all equal to 1):

  • Centered and scaled in block.splsda(). Results in all zero vector but center = 1, scale = 0. Stored in object$X
  • object$X used as newdata parameter for predict() call in auroc()
    • within predict(), 0 values have 1 subtracted from them (centered) are divided by 0 (scaling), resulting in Inf in those predictor values. (In R, 1/0 == Inf)
  • Infs CANNOT be handled safely by the remainder of the function, causing all predictions made on that block to be NaN. This results in downstream issues (like the error raised by `auroc() -> statauc() -> roc.default() -> roc.utils.perfs.fast.all.threshold() -> cut()

Hence, when the newdata parameter is centered and scaled using attributes of object$X, function now checks if any of the values are not finite. If so, then Inf or -Inf are replaced by NaN

fix: added fail safe for when `Inf` or `-Inf` are found in transformed `newdata` data frame.

Changes them to `NaN` which can be safely handled by downstream functions
@Max-Bladen Max-Bladen added wip work-in-progress bug-fix For PR's that address an Issue with `bug` label labels Nov 15, 2022
@Max-Bladen Max-Bladen self-assigned this Nov 15, 2022
tests: added test to maintain coverage
@Max-Bladen Max-Bladen merged commit 7252d3b into master Nov 15, 2022
@Max-Bladen Max-Bladen deleted the issue-266 branch November 15, 2022 22:58
@Max-Bladen Max-Bladen linked an issue Nov 15, 2022 that may be closed by this pull request
@Max-Bladen Max-Bladen removed the wip work-in-progress label Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix For PR's that address an Issue with `bug` label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

predict() produces all NaNs for a given block
1 participant