Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax restrictions on model type in resampling (evaluate!) #985

Merged
merged 3 commits into from
Jul 7, 2024

Conversation

ablaom
Copy link
Member

@ablaom ablaom commented Jul 1, 2024

Current behaviour

Currently a model in evaluate/evaluate! must subtype Supervised or Annotator. However, in the current implementation, all that's essential for the model is:

  • It implements a predict method
  • It's fit method consumes a target y (which must be provided in the evaluate call, or machine constructor).
  • In the common case that operation is left unspecified, MLJModelnterface.prediction_type must be one of: :deterministic, :probabilistic, :interval.

Subtypes of Annotator and Supervised (e.g,Probabilistic, Deterministic, Interval) have the obvious prediction_type fallbacks, but the fallback for Unsupervised models is :unknown.

What this PR does

This PR:

  • Removes the current model type restriction: Now any <:Model is now allowed.
  • Adds the currently missing check that a target y has been provided.
  • In the case that an operation is not specified, and the prediction_type is not one of :deterministic, :probabilistic, :interval, the error which is already thrown in that case is made more informative. It now suggests two remedies: (i) explicitly specify operation(s), or (ii) post an issue to have the model's prediction_type reviewed.

Context

More and more we are relying on traits and less on the abstract model hierarchy. Currently, however, only Unsupervised models can have their transform method propagated in a Pipeline (and predict is propagated by Supervised models). So some models want to be Unsupervised for pipeline use, despite the fact they are also "supervised" - a case in point is RecursiveFeatureElimination (currently Supervised and so not pipelinable).

To do

Just to be sure, let's wait for

  • Run local MLJ integration tests (@ablaom)

Afterword

We should ultimately like to evaluate models that do not consume a target in training, such as clustering models. In that case, the target is only supplied for the test sets (e.g., for computing the Rand index). However, that requires a more substantial redesign of evaluate.

@ablaom ablaom marked this pull request as draft July 1, 2024 02:01
@ablaom ablaom marked this pull request as ready for review July 1, 2024 03:37
@ablaom ablaom requested a review from OkonSamuel July 1, 2024 03:37
Copy link
Member

@OkonSamuel OkonSamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good to me!!!.

@ablaom ablaom merged commit 07f7ca6 into dev Jul 7, 2024
3 checks passed
@ablaom ablaom deleted the relax-resampling-type branch July 7, 2024 19:48
This was referenced Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants