Visualization tool for one and two-parameter tuning #85

ablaom · 2019-02-19T19:55:36Z

Would be nice to have some visualization tools (Plots.jl recipes?) for looking at the results of one or two-parameter tuning, as ouput in the report field of a TunedModel machine.

This might be as simple as adding a suitable instruction to the tuning doc string on how to call an existing recipe.

two-parameter case
one-parameter case (to include uncertainty of performance measurements)

The text was updated successfully, but these errors were encountered:

tlienart · 2019-02-21T01:56:37Z

Do you think using UnicodePlots may make some sense here so that one could keep working in a REPL? It's also much faster and for simple visualisations it may be enough?

In a similar vein there's PrettyTables.jl a lightweight package to quickly visualise dataframes and highlight elements which might be good too (for quickly showing benchmark results for instance, highlighting the best scores according to a range of metrics)

Possibly the visualisation could be inferred from the environment so that if one is in Juno or IJulia, richer multimedia tools are used?

datnamer · 2019-02-21T01:58:10Z

UnicodePlots is also a plots.jl backend. I would recommend having plot recipes for plots.jl and then eventually makie. Then the user can choose the backend

Edit: Maybe @mkborregaard can discuss his experiences with statsplots and how they might help here.

ablaom · 2019-02-21T01:59:22Z

All sound good to me. I'm leaving this one for someone else to implement!

mkborregaard · 2019-02-21T07:50:36Z

I agree that recipes would be a good choice. There are recipes in MCMCChain that I helped develop https://github.com/TuringLang/MCMCChain.jl/blob/master/src/plot.jl which might be similar to what is required here?
I wouldn't mind giving a hand or contributing something if there's a clear description of what you'd like the plotting function to do, show and dispatch on exactly.

fkiraly · 2019-02-21T10:31:30Z

Regarding "tuning plots", it might be best to dispatch the plot method on the fitted model?

Though I see issues with what the default would do:

what should the plot do if there are more than two parameters? Select two? Aggregate? Marginalize?
even if there are only one or two, they could be discrete, ordered, or continuous. A heatmap, or histogram (grid point vs training error), is currently all I can come up with that would fit all of these (ordering unordered factor levels in an arbitrary way)

One thing that the user also expects are "learning curves", though not all models may return these. Maybe separate the grid tuning plot from the learning curves?

ablaom · 2019-02-21T23:59:18Z

I think that in MLR uses bubbles on the grid points (for 2d case). The bubble radius reflects the performance estimate. Maybe that's a better fallback??

@mkborregaard That would be awesome if you could contribute.

Some technical detail for implementers:

Note that hyperparameters in MLJ are generally nested. See MLJ/README.md or MLJ/doc/tour.ipynb for details.
The plotting (for tuning) would be dispatched on objects of type Machine{TunedModel}. After an object mach of this type has been fit (isdefined(mach, fit result) is true), one may extract the tuning stats from a dictionary mach.report, assuming model.report_measurements was true before the fit call.

What you'll want is:

report[:parameter_names] - a row vector of strings of names (with dot-concatenation for nested parameters), eg, ["elastic.alpha", "knn.K"].
report[:parameter_values] - a matrix of the parameter values, one row for each performance estimate (called a measurement).
report[:measurements] - a column vector of performance estimates.

mkborregaard · 2019-02-22T10:22:01Z

Sounds good. Could you provide an example of the kind of end result plot you have in mind?

ablaom · 2019-02-24T19:47:33Z

Here is the aforementioned example of a bubble plot from MLR but I'm not fixed on this if others want to chime in with a different suggestion. Since performance estimates are generally close together, a proportional bubble size obviously doesn't work (ie some recentring/rescaling is called for).

By default, add the following annotations (in title?): the value of mach.measure (eg, auc), the value of typeof(mach.tuning) (the tuning strategy, eg, Grid) and the value of mach.resampling (the resampling strategy (eg, CV(nfolds=6)); typeof(mach.resampling) (eg, CV) would also do.

fkiraly · 2019-02-25T07:21:07Z

Looks good, just a small note: aren't the axes mislabelled? Since should C/sigma in SVM not always be positive? These are probably the untransformed values, or log(C) and log(sigma) for some logarithm basis.
The axes should say so, or display the correct values for the paramaters.

Is this a bug in mlr?

ablaom · 2019-02-25T20:55:11Z

Dunno. Just copied and pasted from MLR slides

fkiraly · 2019-02-25T21:09:13Z

Hm, also, shouldn't "area under the (ROC) curve" be at least 0.5? Something is fishy in this plot...

Anyway, I personally like the heatplot more for this purpose, but that's really just a matter of taste.

mkborregaard · 2019-02-26T08:42:09Z

OK, so what's the suggestion when using a non-linear scale? Here's some work with the tuned_ensemble model from the readme:
scatter:

As you can see, lots of overlapping points. Also, the points have very similar sizes (and IMHO it's not appropriate to rescale when using sizes, as our eyes intuitively compare with 0). You could use color as well:

Heatmap gives more weight to the color and fills in:

You could adjust the grid to be even if preferred:

fkiraly · 2019-02-26T11:31:18Z

hm, would it look better if you:

plot by log(atom.K)
use different sized bubbles
?

mkborregaard · 2019-02-26T11:33:22Z

But if it were to be general, would that require examining the object somehow to see that one axis was log spaced, then scale that?
And I do use different sized bubbles. The values are just really similar, and as I said it's not really fair to transform by the smallest size in terms of honest visualization.

mkborregaard · 2019-02-26T11:35:11Z

Also, for some reason the grid looks weird when log transforming:

tlienart · 2019-02-26T12:22:30Z

That's nice! Just chipping in but it seems to me quite common that models can end up having very similar performances for a range of hyperparameter especially trivial models like KNN so it's probably fair to just return a flat landscape where that's the case. That being it seems to me KNN will in general only vary for very low K (so from K=1 to K=10 maybe) so maybe the range shown here on plots is a bit too big?

Finally I feel the first heatmap is the visualisation that is the clearest out of the ones posted here but maybe just personal preference.

mkborregaard · 2019-02-26T12:25:07Z

(not a response, just the notion that a surface plot might also be intuitive here)

tlienart · 2019-02-26T12:30:11Z

IMO 3D plots look cool but are in general hard to read. Also with the idea that we could potentially have MLJ revert to UnicodePlots backend in a "fully-REPL-mode", 3D would not be the best option I think?

mkborregaard · 2019-02-26T12:32:00Z

I agree with your first point in principle, but not, in fact, for fitting surfaces. Anyway, the heatmap recipe would allow the user to default to surface whenever appropriate.

fkiraly · 2019-02-26T12:49:10Z

This might highlight the challenge of determining proper axis ranges (or dot size ranges). Scale at min/max plusminus epsilon might be necessary.

ablaom · 2019-02-26T19:56:57Z

The scale used is part of the information accessible to you from the TunedModel object. Because of the nested nature of the hyperparameters, it is not in the most convenient form for 2D plotting but I will work on this over the next day or so.

I will arrange to have the scales output as report[:parameter_scales] (see technical notes above).

ablaom · 2019-03-05T02:06:42Z

Okay, scales are now available; see issue #92 for details and example.

mkborregaard · 2019-03-05T08:39:19Z

And did you prefer the heatmap or the dots? Which of the above appeals?

ablaom · 2019-03-05T18:53:06Z

@mkborregaard Marvellous work, by the way.

I guess I prefer the bubbles as a default because you can see where the samples were actually taken and, at a glance, what the resolution was, and so forth. Also a bubble size is immediately understood - a colour needs to be interpreted. Despite your concern for "honest" reproduction, I would rescale as others have also suggested. How else can I distinguish estimates that are typically very close together? A really honest plot would have an indication of uncertainty of estimate at each point (available in principle in cv case but not holdout) but I think these plots serve more of a diagnostic purpose than a final reporting one, no?

That said, I do not have a strong preference and you are never going to please everyone. As long as I can quickly see where the low or unusual points are, I am happy.

Side issue: How would we present a a 2D parameter plot using UnicodePlots? I would love the option of a REPL plot without having to load the Plots.jl frontend first (regular pain!!).

BTW: The log plot spacing is not uniform in your bubble plot above because K is integer valued. So rounding forces non-uniform spacing on the log scale. So, this is just the way it is; nothing wrong with using the log scale there (which was the scale used to generate the grid).

fkiraly · 2019-03-05T20:14:20Z

Hm, plot-wise I think heatmap is the clearest, with surface plot also not bad - though more difficult to read and objectively compare z-values.
Though I personally prefer to look at the matrices (with numbers) rather than any plot.

@ablaom , regarding uncertainty estimate - absolutely agreed. But it is unclear how to show it in a 2D heatmap or a 3D plot - uncertainty envelopes/tunnels are however good to show in an 1D plot performance-vs-parameter_value (if there's one parameter only). Which brings us to the topic how to compute confidence intervals/Regions, one of my favourite topics that you should not have gotten me started on. Spoiler: I don't think using cross-validation re-samples gives good confidence intervals.

ablaom · 2019-03-06T05:39:12Z

@fkiraly

Though I personally prefer to look at the matrices (with numbers) rather than any plot."

Me too!

Spoiler: I don't think using cross-validation re-samples gives good confidence intervals.

Look forward to discussion at meeting next week.

ablaom · 2019-03-06T05:41:11Z

@mkborregaard Please note that reports have just become named tuples. So accesss is by property not index. See NEWS.md.

#94

mkborregaard · 2019-03-06T17:16:40Z

I've got to say I'm spending almost all of the time here searching for a way to generate the object you want plotted, rather than with coming up with a recipe. The old one I had generated from your readme doesn't work anymore, and the tuned model generated on your tour doesn't seem to be of them type you're interested in. There's an example over in #92, but it's missing something called sel. I'll give up on this now, but check back later next week if you post an example.

fkiraly · 2019-03-06T18:27:22Z

@mkborregaard thanks for the feedback - this might be taken to mean that the interface for inspecting fitted models may have to be improved, in general - see ongoing #51 discussion.

Thus, it might make sense to assume you have the data already in a nice format, and wait for the interface extension to give the objects nicely to you, rather than find the best way to pry it from the learning machine's cold, dead hands.

Unless, of course, @ablaom recommends another way to proceed.

ablaom · 2019-03-06T23:35:32Z

@mkborregaard

I'm sorry to hear about your frustrations. The main issue I believe is the unlucky breaking change in the format of report (from dictionary to named tuple) and the incompleteness of the example posted at #51 (which the change immediately made redundant) for which I apologise.

and the tuned model generated on your tour doesn't seem to be of them type you're interested in.

I'm interested in this example. I have posted a distilled, complete, and tested version of the tour example, showing how to get everything I expect you need. You need to update your MLJ installation (including MLJBase and MLJModels) to the version posted about 17 hours ago to get it to work but let me know if you have problems.

I doubt very much there will be further API changes in the foreseeable future that will break code based on this example. Let me know if and when you decide to have another go and thanks again for the work so far.

Your method (adapted into recipe) will look something like

function plot(mach::MLJ.Machine{<:MLJ.EitherTunedModel}) 
    r = report(mach)
    xlab, ylab = r.parameter_names
    xscale, yscale = r.parameter_scales
    x = r.parameter_values[:,1]
    y = r.parameter_values[:,2]
    z = r.measurements
    <code to generate plot>
end

EitherTunedModel is an alias for Union{DeterministicTunedModel,ProbabilisticTunedModel}.

mkborregaard · 2019-03-07T07:08:52Z

Thanks for making it so easy for me now - I'll post the recipe as soon as I have a moment :-)

mkborregaard · 2019-03-07T08:44:20Z

#99

baggepinnen · 2019-12-19T07:16:45Z

I find it quite useful to visualize hyper-parameter tuning for more parameters than 2 as well. Simply plotting ranges vs function values for all parameters is a reasonable way of presenting the information. You can't determine interaction between parameters from this, but you can see overall trends for individual parameters and it quickly becomes apparent if one parameter is much more important than the others
example: https://github.com/baggepinnen/Hyperopt.jl/blob/master/figs/ho.svg

mkborregaard · 2019-12-19T07:25:55Z

@baggepinnen 's plot looks nice.

Should this issue have been closed though?

ablaom · 2020-01-03T03:34:27Z

closed in favour of #416

ablaom added enhancement New feature or request good first issue Good for newcomers labels Feb 19, 2019

fkiraly mentioned this issue Feb 21, 2019

Visualizations and model-agnostic model diagnostic reports? #89

Open

fkiraly mentioned this issue Mar 6, 2019

Unsupervised learning interfaces - is transformer too narrow? #51

Open

ablaom mentioned this issue Mar 18, 2019

Something not right with bubbles in tuning plots. #103

Closed

ablaom mentioned this issue Jan 3, 2020

Visualizing hyperparameter tuning results for arbitrary numbers of parameters #416

Closed

ablaom closed this as completed Jan 3, 2020

Visualization tool for one and two-parameter tuning #85

Visualization tool for one and two-parameter tuning #85

Comments

ablaom commented Feb 19, 2019 • edited Loading

tlienart commented Feb 21, 2019 • edited Loading

datnamer commented Feb 21, 2019 • edited Loading

ablaom commented Feb 21, 2019

mkborregaard commented Feb 21, 2019

fkiraly commented Feb 21, 2019

ablaom commented Feb 21, 2019

mkborregaard commented Feb 22, 2019

ablaom commented Feb 24, 2019

fkiraly commented Feb 25, 2019 • edited Loading

ablaom commented Feb 25, 2019

fkiraly commented Feb 25, 2019

mkborregaard commented Feb 26, 2019

fkiraly commented Feb 26, 2019

mkborregaard commented Feb 26, 2019

mkborregaard commented Feb 26, 2019

tlienart commented Feb 26, 2019 • edited Loading

mkborregaard commented Feb 26, 2019 • edited Loading

tlienart commented Feb 26, 2019

mkborregaard commented Feb 26, 2019

fkiraly commented Feb 26, 2019

ablaom commented Feb 26, 2019 • edited Loading

ablaom commented Mar 5, 2019

mkborregaard commented Mar 5, 2019

ablaom commented Mar 5, 2019 • edited Loading

fkiraly commented Mar 5, 2019 • edited Loading

ablaom commented Mar 6, 2019

ablaom commented Mar 6, 2019

mkborregaard commented Mar 6, 2019

fkiraly commented Mar 6, 2019

ablaom commented Mar 6, 2019

mkborregaard commented Mar 7, 2019

mkborregaard commented Mar 7, 2019

baggepinnen commented Dec 19, 2019

mkborregaard commented Dec 19, 2019

ablaom commented Jan 3, 2020

ablaom commented Feb 19, 2019 •

edited

Loading

tlienart commented Feb 21, 2019 •

edited

Loading

datnamer commented Feb 21, 2019 •

edited

Loading

fkiraly commented Feb 25, 2019 •

edited

Loading

tlienart commented Feb 26, 2019 •

edited

Loading

mkborregaard commented Feb 26, 2019 •

edited

Loading

ablaom commented Feb 26, 2019 •

edited

Loading

ablaom commented Mar 5, 2019 •

edited

Loading

fkiraly commented Mar 5, 2019 •

edited

Loading