-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsupervised learning interfaces - is transformer too narrow? #51
Comments
I think this is a good point. There are two choices for exposing extra functionality at present: (i) (ii) one implements methods beyond |
@ablaom, I think the report dictionary returned by fit should, at most, be diagnostic reports of the fitting itself and not be abused for parameter inference or reporting. I'd personally introduce a single method for all models, e.g., fitted_params which could return a dictionary of model parameters and diagnostics. These would be different for each model - for example, for ordinary least squares regression, it might return coefficients, CI, R-squared, and t/F test results. What we may want to be careful about is the interaction with the parameter interface. I usually like to distinguish hyper-parameters = set externally, not changed by fit, and model parameters = no external access, set by fit. |
Two issues here: Type of information to be accessed after a Method of access. Dictionary or method. The original idea of dictionary was that it would be a persistent kind of thing, or even some kind of log/history. A dictionary has the added convenience that one adds keys according to circumstance (e.g., if I set a hyperparameter requesting I like the simplicity of a returning a single object to report all information of possible interest, computed after every fit, whether it be fitted parameters or whatever. What is less clear to me is whether information that requires extra computation should be accessed: (i) by requesting the computation through an "instruction" hyperparameter and returning the result in the same (ii) having a dedicated method dispatched on the fit-result, like predict. Your thoughts?
|
Some thoughts (after a longer time of thinking): I think it would be a good idea to have a dedicated interface for fitted parameters, just as we have for hyperparameters, i.e., dictionary-style, and following exactly the same structure, nesting and accessor conventions for the fitting result as we have for the models. What is automatically returned in this extension of fitresult are "standard model parameters that are easy to compute", i.e., it can be more than what predict needs but shouldn't add a lot of computational overhead. It also should be data-agnostic model structure parameters (e.g., model coefficients), or easy-to-obtain intermediate results for diagnostics (e.g., R-squared?). Separate from this should be operations on the model that require significant computational overhead over fit/predict (e.g., variable importance), or that are data-dependent (e.g., F-test in-sample). The standard stuff - i.e., standard methodology for diagnostics and parameter inference (e.g., for OLS, t-tests, CI, F-test, R-squared, diagnostic plots) I'd put in fixed dispatch methods diagnose (returns pretty-printable dict-like of summaries) or diagnose_visualize (produces plots/visualizations). Advanced and non-standard diagnostics (e.g., specialized diagnostics or non-canonical visualizations) should be external, but these will be facilitated through the standardized model parameter interface once it exists. Thoughts? |
@fkiraly I have come around to accepting your suggestion for a dedicated method to retrieve fitted parameters, separate from the report field of a machine. I also agree that I am working on implementing these various things simultaneously. |
A noteworthy difference being that a NamedTuple is immutable, could that cause a problem here? |
@ablaom, I'm onboard with NamedTuple or dictionary returned by method. The method be able to return abstract structs in its fields, and should be able to change with each run of fit. Regarding user interface: I'd make it a method (by dispatch), and call it "inspect" unless you have a better idea. On a side note, I think this would also help greatly with the issue highlighted in the visualization issue #85 , the "report" being possibly arcane and non-standardized. Further to this, I think computationally expensive diagnostics such as "interpretable machine learning" style meta-methods should not be bundled with "inspect", but rather with external "interpretability meta-methods" (to be dealt with at a much later point). |
Hm, maybe another two default interface points - "print" and "plot" would be great? "print" gives back a written summary, for example Call:
lm(formula = weight ~ group - 1)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
groupCtl 5.0320 0.2202 22.85 9.55e-15 ***
groupTrt 4.6610 0.2202 21.16 3.62e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798
F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16 "plot" produces a series of standard diagnostic plots, which may differ by model type and/or task. I would conjecture there's some that you always want for a task (e.g., cross-plot and residual plot for deterministic supervised regerssion; calibration curves for probabilistic classification), and some that you only want for a specific model class (e.g., learning curves for SGD based methods, heatmaps for tuning methods) |
Interesting question: where would "cross-plots out-of-sample" sit? Probably only available in the evaluation/validation phase, i.e., with the benchmark orchestrator. |
Actually, I notice you already made a suggestion for a name: fitted_params. |
Also I realize, I've already said some of these things, albeit slightly differently, on Feb 4. |
To clarify the existing design, we have these methods (dispatched on machines,
As laid out in the guide (see below): Whether or not a computationally expensive item is actually computed is controlled by an "instruction" hyperparameter of the model. If a default value is not overridden, the item is empty (but the key is still there), a clue to user that more is available. I prefer this to a separate method to avoid method name proliferation. I think the above cover MLR's "print" method. But we could overload Not so keen on changing name of "report" as this is breaking. @tlienart I think every item of From the guide:
... A MLJBase.fitted_params(model::SomeSupervisedModelType, fitresult) -> friendly_fitresult::NamedTuple For a linear model, for example, one might declare something like The fallback is to return |
Very sensible. Maybe, do you want to make plot a specified/uniform interface point as well, along the lines of your suggestion in #85 (and/or mine above)? Small detail regarding your reference "mlr's print". It is actually the R language itself (i.e., base R) which has "print" and "plot" as designated interface points. |
"report" could be "inspect" the next time we write an MLJ, but let's not change a working system. |
At the moment the Plots.jl package "plot" function just about the "standard" Julia interface point for plotting, although the future is not clear to me and others may have a better crystal ball. Plots.jl is a front end for plotting and, at present, most of the backends are still wrapped C/Python/Java code. It is a notorious nuisance to load and execute first time. However, there is a "PlotsBase" (called PlotRecipes) which allows you to import the "plot" function you overload in your application, without loading Plots or a backend (until you need it). |
... we could factor out in a MLJplots module, thus solving the dependency issue? |
No, no. This is not necessary. We only need PlotsBase (lightweight) as a dependency. The user does need to manually load Plots.jl if they want to plot, but I don't think that's a big deal. The backends get lazy-loaded (ie, as needed). |
@fkiraly and others. Returning to your original comment opening this thread, where should one-class classification fit into our scheme? Unsupervised, yes? |
In terms of taxonomy, I'd consider that something completely different, i.e., neither supervised nor unsupervised. I'd consider one-class classifiers (including one-class kernel SVM) as an instance of outlier detectors, or anomaly detectors (if also on-line). Even in the case where labelled outliers/artefacts/anomalies are provided in the training set, it's different from the (semi-)supervised task, since there is a designated "normal" class. It's also different from unsupervised, since unsupervised methods have no interface point to feed back "this is an anomaly". I.e., naturally, the one-class-SVM would have a task-specific fit/detect interface (or similar, I'm not too insistent on naming here). One could also consider it sitting in the wider class of "annotator" tasks. |
Does this mean the type hierarchy is not granular enough. Maybe it should be traits |
@datnamer, that's an interesting question for @ablaom - where do we draw the distinction between type and trait? If I recall an earlier discussion correctly, whenever we need to dispatch or inherit differently? It's just a feeling, but I think anomaly detectors and (un)supervised learners should be different - you can use the latter to do the former, so if feels more like a wrapper/reduction rather than trait variation. |
Some coarse distinctions are realised in a type hierarchy. From the docs: The ultimate supertype of all models is abstract type Supervised <: Model end
abstract type Unsupervised <: Model end
abstract type Probabilistic <: Supervised end
abstract type Deterministic <: Supervised end All further distinctions are realised with traits some of which take values in the scitype hierarchy or in types derived from them. An example of such a trait is So, I suppose we create a new abstract subtype of Obviously this not a priority right now but it did recently come up. |
@ablaom regarding Regarding unsupervised learners: have we progressed about the distinction between (i) and (ii) at least, from the first post? For #161 especially, a "transformer" type (or sub-type? aspect?) as in (i) would be necessary. Update: actually, I think we will be fine with (i), i.e., transformer style behaviour only for ManifoldLearning.jl in #161. |
Regarding unsupervised models such as PCA, kmeans, etc discussed in #44.
I know these are commonly encapsulated within the transformer formalism, but it would do the methodology behind them injustice as feature extraction is only one major usage cases of unsupervised models. More precisely, there are, as far as I can see, three use cases:
(i) feature extraction. For clusterers, create a column with cluster assignment. For continuous dimension reducers, create multiple continuous columns.
(ii) model structure inference - essentially, inspection of the fitted parameters. E.g., PCA components and loadings. Cluster separation metrics etc. These may be of interest in isolation, or used as an (hyper-parameter) input of other atomic models in a learning pipeline.
(iii) full probabilistic modelling aka density estimation. This behaves as a probabilistic multivariate regressor/classifier on the input variables.
For the start if makes sense to implement only "transformer" functionality, but it is maybe good to keep in mind for implementation that eventually one may like to expose the other outputs via interfaces. E.g., the estimated multivariate density in a fully probabilistic implementation of k-means.
The text was updated successfully, but these errors were encountered: