forked from mlr-org/mlr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
958 lines (870 loc) · 39.3 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
mlr 2.14:
--------------------------------------------------------------------------------
general:
* add option to use fully predefined indices in resampling (`makeResampleDesc(fixed = TRUE)`)
functions - general:
* getResamplingIndices(inner = TRUE) now correctly returns the inner indices (before inner indices referred to the subset of the respective outer level train set)
filter - new:
* praznik.JMI
* praznik.DISR
* praznik.JMIM
* praznik.MIM
* praznik.NJMIM
* praznik.MRMR
* praznik.CMIM
filter - general:
* Replaced filters `information.gain`, `gainratio` and `symmetrical.uncertainty`depending on package `FSelector` by package `FSelectorRcpp`.
The change comes with a ~ 100 times speedup.
learners - new:
* classif.liquidSVM
* regr.liquidSVM
featSel - general:
* The FeatSelResult object now contains an additional slot `x.bit.names` that stores the optimal bits
* The slot `x` now always contains the real feature names and not the bit.names
* This fixes a bug and makes `makeFeatSelWrapper` usable with custom `bit.names`.
mlr 2.13:
--------------------------------------------------------------------------------
general:
* Disabled unit tests for CRAN, we test on travis only now
* Suppress messages with show.learner.output = FALSE
functions - general:
* plotHyperParsEffect: add colors
functions - new:
* getResamplingIndices
* createSpatialResamplingPlots
learners - general:
* regr.nnet: Removed unneeded params linout, entropy, softmax and censored
* regr.ranger: Add weight handling
learners - removed:
* {classif,regr}.blackboost: broke API with new release
* regr.elmNN : package was removed from CRAN
* classif.lqa : package was removed from CRAN
mlr 2.12:
--------------------------------------------------------------------------------
general:
* Support for functional data (fda) using matrix columns has been added.
* Relaxed the way wrappers can be nested -- the only explicitly forbidden
combination is to wrap a tuning wrapper around another optimization wrapper
* Refactored the resample progress messages to give a better overview and
distinguish between train and test measures better
* calculateROCMeasures now returns absolute instead of relative values
* Added support for spatial data by providing spatial partitioning methods "SpCV" and "SpRepCV".
* Added new spatial.task classification task.
* Added new spam.task classification task.
* Classification tasks now store the class distribution in the
class.distribution member.
* mlr now predicts NA for data that contains NA and learners that do not support
missing values.
* Tasks are now subsetted in the "train" function and the factor levels (for
classification tasks) based on this subset. This means that the factor level
distribution is not necessarily the same as for the entire task, and that the
task descriptions of models in resampling reflect the respective subset, while
the task description of resample predictions reflect the entire task and not
necessarily the task of any individual model.
* Added support for growing and fixed window cross-validation for forecasting
through new resample methods "GrowingWindowCV" and "FixedWindowCV".
functions - general:
* generatePartialDependenceData: depends now on the "mmpf" package,
removed parameter: "center", "resample", "fmin", "fmax" and "gridsize"
added parameter: "uniform" and "n" to configure the grid for the partial dependence plot
* batchmark: allow resample instances and reduction of partial results
* resample, performance: new flag "na.rm" to remove NAs during aggregation
* plotTuneMultiCritResultGGVIS: new parameters "point.info" and "point.trafo" to
control interactivity
* calculateConfusionMatrix: new parameter "set" to specify whether confusion
matrix should be computed for "train", "test", or "both" (default)
* PlotBMRSummary: Add parameter "shape"
* plotROCCurves: Add faceting argument
* PreprocWrapperCaret: Add param "ppc.corr", "ppc.zv", "ppc.nzv", "ppc.n.comp", "ppc.cutoff", "ppc.freqCut", "ppc.uniqueCut"
functions - new:
* makeClassificationViaRegressionWrapper
* getPredictionTaskDesc
* helpLearner, helpLearnerParam: open the help for a learner or get a
description of its parameters
* setMeasurePars
* makeFunctionalData
* hasFunctionalFeatures
* extractFDAFeatures, reextractFDAFeatures
* extractFDAFourier, extractFDAFPCA, extractFDAMultiResFeatures, extractFDAWavelets
* makeExtractFDAFeatMethod
* makeExtractFDAFeatsWrapper
* getTuneResultOptPath
* makeTuneMultiCritControlMBO: Allows model based multi-critera / multi-objective optimization using mlrMBO
functions - removed:
* Removed plotViperCharts
measures - general:
* measure "arsq" now has ID "arsq"
* measure "measureMultiLabelF1" was renamed to "measureMultilabelF1" for consistency
measures - new:
* measureBER, measureRMSLE, measureF1
* cindex.uno, iauc.uno
learners - general:
* unified {classif,regr,surv}.penalized{ridge,lasso,fusedlasso} into {classif,regr,surv}.penalized
* fixed a bug where surv.cforest gave wrong risk predictions (#1833)
* fixed bug where classif.xgboost returned NA predictions with multi:softmax
* classif.lda learner: add 'prior' hyperparameter
* ranger: update hyperpar 'respect.unordered.factors', add 'extratrees' and 'num.random.splits'
* h20deeplearning: Rename hyperpar 'MeanSquare' to 'Quadratic'
* h20*: Add support for "missings"
learners - new:
* classif.adaboostm1
* classif.fdaknn
* classif.fdakernel
* classif.fdanp
* classif.fdaglm
* classif.mxff
* regr.fdaFDboost
* regr.mxff
learners - removed:
* {classif,regr}.bdk: broke our API, stability issues
* {classif,regr}.xyf: broke our API, stability issues
* classif.hdrda: package removed from CRAN
* surv.penalized: stability issues
aggregations - new:
* testgroup.sd
filter - new:
* auc
* ranger.permutation, ranger.impurity
mlr 2.11:
--------------------------------------------------------------------------------
general:
* The internal class naming of the task descriptions have been changed causing probable incompatibilities with tasks generated under old versions.
* New option on.error.dump to include dumps that can be inspected with the
debugger with errors
* mlr now supports tuning with Bayesian optimization with mlrMBO
functions - general:
* tuneParams: fixed a small and obscure bug in logging for extremely large ParamSets
* getBMR-operators: now support "drop" argument that simplifies the resulting list
* configureMlr: added option "on.measure.not.applicable" to handle situations where performance
cannot be calculated and one wants NA instead of an error - useful in, e.g., larger benchmarks
* tuneParams, selectFeatures: removed memory stats from default output for
performance reasons (can be restored by using a control object with "log.fun"
= "memory")
* listLearners: change check.packages default to FALSE
* tuneParams and tuneParamsMultiCrit: new parameter `resample.fun` to specify a custom resampling function to use.
* Deprecated: getTaskDescription, getBMRTaskDescriptions, getRRTaskDescription.
New names: getTaskDesc, getBMRTaskDescs, getRRTaskDesc.
functions - new:
* getOOBPreds: get out-of-bag predictions from trained models for learners that store them -- these learners have the new "oobpreds" property
* listTaskTypes, listLearnerProperties
* getMeasureProperties, hasMeasureProperties, listMeasureProperties
* makeDummyFeaturesWrapper: fuse a learner with a dummy feature creator
* simplifyMeasureNames: shorten measure names to the actual measure, e.g.
mmce.test.mean -> mmce
* getFailureModelDump, getPredictionDump, getRRDump: get error dumps
* batchmark: Function to run benchmarks with the batchtools package on high performance computing clusters
* makeTuneControlMBO: allows Bayesian optimization
measures - new:
* kendalltau, spearmanrho
learners - general:
* classif.plsdaCaret: added parameter "method".
* regr.randomForest: refactored se-estimation code, improved docs and default is now se.method = "jackknife".
* regr.xgboost, classif.xgboost: removed "factors" property as these learners do not handle categorical features
-- factors are silently converted to integers internally, which may misinterpret the structure of the data
* glmnet: control parameters are reset to factory settings before applying
custom settings and training and set back to factory afterwards
learners - removed:
* {classif,regr}.avNNet: no longer necessary, mlr contains a bagging wrapper
mlr 2.10:
--------------------------------------------------------------------------------
functions - general:
* fixed bug in resample when using predict = "train" (issue #1284)
* update to irace 2.0 -- there are algorithmic changes in irace that may affect
performance
* generateFilterValuesData: fixed a bug wrt feature ordering
* imputeLearner: fixed a bug when data actually contained no NAs
* print.Learner: if a learner hyperpar was set to value "NA" this was not
displayed in printer
* makeLearner, setHyperPars: if you mistype a learner or hyperpar name, mlr
uses fuzzy matching to suggest the 3 closest names in the message
* tuneParams: tuning with irace is now also parallelized, i.e., different
learner configs are evaluated in parallel.
* benchmark: mini fix, arg 'learners' now also accepts class strings
* object printers: some mlr printers show head previews of data.frames.
these now also print info on the total nr of rows and cols and are less confusing
* aggregations: have better properties now, they know whether they require training or
test set evals
* the filter methods have better R docs
* filter randomForestSRC.var.select: new arg "method"
* filter mrmr: fixed some smaller bugs and updated properties
* generateLearningCurveData: also accepts single learner, does not require a list
* plotThreshVsPerf: added "measures" arg
* plotPartialDependence: can create tile plots with joint partial dependence
on two features for multiclass classification by facetting across the classes
* generatePartialDependenceData and generateFunctionalANOVAData: expanded
"fun" argument to allow for calculation of weights
* new "?mlrFamilies" manual page which lists all families and the functions
belonging to it
* we are converging on data.table as a standard internally, this should not
change any API behavior on the outside, though
* generateHyperParsEffectData and plotHyperParsEffect now support more than 2
hyperparameters
* linear.correlation, rank.correlation, anova.test: use Rfast instead of
FSelector/custom implementation now, performance should be much better
* use of our own colAUC function instead of the ROCR package for AUC calculation
to improve performance
* we output resample performance messages for every iteration now
* performance improvements for the auc measure
* createDummyFeatures supports vectors now
* removed the pretty.names argument from plotHyperParsEffect -- labels can be set
though normal ggplot2 functions on the returned object
* Fixed a bad bug in resample, the slot "runtime" or a ResampleResult,
when the runtime was measured not in seconds but e.g. mins. R measures then potentially in mins,
but mlr claimed it would be seconds.
* New "dummy" learners (that disregard features completely) can be fitted now for baseline comparisons,
see "featureless" learners below.
functions - new:
* filter: randomForest.importance
* generateFeatureImportanceData: permutation-based feature importance and local
importance
* getFeatureImportanceLearner: new Learner API function
* getFeatureImportance: top level function to extract feature importance
information
* calculateROCMeasures
* calculateConfusionMatrix: new confusion-matrix like function that calculates
and tables many receiver operator measures
* makeLearners: create multiple learners at once
* getLearnerId, getLearnerType, getLearnerPredictType, getLearnerPackages
* getLearnerParamSet, getLearnerParVals
* getRRPredictionList
* addRRMeasure
* plotResiduals
* getLearnerShortName
* mergeBenchmarkResults
functions - renamed:
* Renamed rf.importance filter (now deprecated) to randomForestSRC.var.rfsrc
* Renamed rf.min.depth filter (now deprecated) to randomForestSRC.var.select
* Renamed getConfMatrix (now deprecated) to calculateConfusionMatrix
* Renamed setId (now deprecated) to setLearnerId
functions - removed:
* mergeBenchmarkResultLearner, mergeBenchmarkResultTask
learners - general:
* classif.ada: fixed some param problem with rpart.control params
* classif.cforest, regr.cforest, surv.cforest:
removed parameters "minprob", "pvalue", "randomsplits"
as these are set internally and cannot be changed by the user
* regr.GPfit: some more params for correlation kernel
* classif.xgboost, regr.xgboost: can now properly handle NAs (property was missing and other problems), added "colsample_bylevel" parameter
* adapted {classif,regr,surv}.ranger parameters for new ranger version
learners - new:
* multilabel.cforest
* surv.gbm
* regr.cvglmnet
* {classif,regr,surv}.gamboost
* classif.earth
* {classif,regr}.evtree
* {classif,regr}.evtree
learners - removed:
* classif.randomForestSRCSyn, regr.randomForestSRCSyn: due to continued stability issues
measures - new:
* ssr, qsr, lsr
* rrse, rae, mape
* kappa, wkappa
* msle, rmsle
mlr 2.9:
--------------------------------------------------------------------------------
functions - general:
* various cleanups that removed unused code
* subsetTask, getTaskData: arg "features" now also accepts logical and integer
* removeConstantFeatures now also operates on data.frames and
makeRemoveConstantFeaturesWrapper can be used to augment a learner with this
preprocessing step.
* normalizeFeatures, createDummyFeatures: arg 'exclude' was replaced by 'cols'
* normalizeFeatures is now S3 and can be called also on data.frames
* SMOTEWrapper: fix a bug where "sw.nn" was not correctly passed down
* fixed a bug that caused hyperparameters to be not passed on correctly in the
ModelMultiplexer in some cases
* fix bug with NoFeaturesModel and ModelMultiplexer
* fix small bug in DownsampleWrapper when trained with weights
* getNestedTuneResultsOptPathDf: added new arg "trafo"
* improve documentation for permutation.importance filter and perform slight
argument renaming to fix potential name clashes
* plotPartialDependence can plot classification tasks with more than one
interacted features now
* generateFilterValuesData: added argument 'more.args'
* add pretty.names arguments to plots that show learner short names instead of IDs
* addition of 'data' argument to plotPartialDependence which adds the training
data to the graph
* added new arguments "facet.wrap.nrow" and "facet.wrap.ncol" which enable
arrangement of facets in
rows and columns to plotting functions
functions - new:
* generateHyperParsEffectData, plotHyperParsEffect
* makeMultilabelClassifierChainsWrapper, makeMultilabelDBRWrapper
makeMultilabelNestedStackingWrapper, makeMultilabelStackingWrapper
* makeConstantClassWrapper
* generateFunctionalANOVAData
functions - removed:
* getParamSet generic (now in ParamHelpers package)
functions - renamed:
* generatePartialPrediction to generatePartialDependence
* plotPartialPrediction to plotPartialDependence
* plotPartialPredictionGGVIS to plotPartialDependenceGGVIS
learners - general:
* fixed weight handling and weight tag for some learners
* remove unnecessary linear.output parameter for classif.neuralnet
* remove unsupported KSVM parameter value stringdot
* fix some bartMachine compatibility issues
* classif.ranger, regr.ranger and surv.ranger: now respect unordered factors by
default
* clean up randomForestSRC and randomForestSRCSyn learners
* the "penalized" learner were restructured and improved (params were added), also
see below.
* add stability.nugget parameter for "regr.km"
* classif.blackboost, regr.blackboost: made sure that arg "stump" is passed on
correctly
* fixed parameter values for WEKA learners IBk, J48, PART, EM, SimpleKMeans, XMeans
* classif.glmboost, regr.glmboost: add parameters stopintern and trace
learners - new:
* classif.C50
* classif.gausspr
* classif.penalized.fusedlasso
* classif.penalized.lasso
* classif.penalized.ridge
* classif.h2o.deeplearning
* classif.h2o.gbm
* classif.h2o.glm
* classif.h2o.randomForest
* classif.rrf
* regr.penalized.fusedlasso
* regr.gausspr
* regr.glm
* regr.GPfit
* regr.h2o.deeplearning
* regr.h2o.gbm
* regr.h2o.glm
* regr.h2o.randomForest
* regr.rrf
* surv.cv.CoxBoost
* surv.penalized.fusedlasso
* surv.penalized.lasso
* surv.penalized.ridge
* cluster.kkmeans
* multilabel.randomforestSRC
learners - removed:
* surv.optimCoxBoostPenalty
* surv.penalized (split up, see new learners above)
measures - general:
* updated gmean measure and unit test, added reference to formula of gmean
* makeCostMeasure: removed arg "task", names of cost matrix are checked on measure
calculation
measures - new:
* multiclass.brier
* brier.scaled
* logloss
* multilabel.subset01, multilabel.f1, multilabel.acc, multilabel.ppv,
multilabel.tpr
* multiclass.au1p, multiclass.au1u, multiclass.aunp, multiclass.aunu
measures - renamed:
* multiclass.auc to multiclass.au1u
* hamloss to multilabel.hamloss
mlr 2.8:
--------------------------------------------------------------------------------
* Feature filter "univariate" had a bad name, was deprecated and is now called
"univariate.model.score". The new one also has better defaults.
* (generate/plot)PartialPrediction: added new arg "geom" for tile plots
* small fix for plotBMRSummary
* the ModelMultiplexer inherits its predict.type from the base learners now
* check that learners in an ensemble have the same predict.type
* new function getBMRModels to extract stored models from a benchmark result
* Fixed a bug where several learners from the LiblineaR package
("classif.LiblineaRL2LogReg", "classif.LiblineaRL2SVC", "regr.LiblineaRL2L2SVR")
were calling the wrong value for "type" (0) and thus training the wrong model.
* Fixed a bug where the resampling objects hout, cv2, cv3, cv5, cv10 were not
documented in the ResampleDesc help page
* regr.xgboost, classif.xgboost: add feval param
* fixed a bug in irace tuning interface with unamed discrete values
* Fixed bugs in "jackknife" and "bootstrap" se estimators for regr.randomForest.
* Added "sd" estimator for regr.randomForest.
* Fixed a mini bug in ModelMultiplexer where hyperpars that are only needed in
predict were not passed down correctly
* Fixed a bug where the function capLargeValues wasn't working if you passed a
task.
* capLargeValues now has a new argument "target", to prevent from capping response
values.
* classif.gbm, regr.gbm: Updated possible 'distribution' settings a bit.
* oversample, undersample, makeOversampleWrapper, makeUndersampleWrapper,
makeOverBaggingWrapper:
Added arguments to specifically select the sampled class.
API changes:
* listLearners now returns a data frame with properties of the learners if
create is false
new functions:
* getBMRModels
removed functions:
* generateROCRCurvesData, plotROCRCurves, plotROCRCurvesGGVIS
new learners:
* classif.randomForestSRCSyn
* classif.cvglmnet
* regr.randomForestSRCSyn
* cluster.dbscan
new measures:
* rsq, arsq, expvar
mlr 2.7:
--------------------------------------------------------------------------------
* New argument "models" for function benchmark
* fixed a bug where 'keep.pred' was ignored in the benchmark function
* some of the very new functions for benchmark plots had to be refactored and/or
renamed.
these names are gone from the API:
plotBenchmarkResult, generateRankMatrixAsBarData, plotRankMatrixAsBar, generateBenchmarkSummaryData, plotBenchmarkSummary,
this is the new API:
plotBMRSummary, plotBMRBoxplots, plotBMRRanksAsBarChart
mlr 2.6:
--------------------------------------------------------------------------------
* cluster.kmeans: added support for fuzzy clustering (property "prob")
* regr.lm: removed some erroneous param settings
* regr.glmnet: added 'family' param and allowed 'gaussian', but also 'poisson'
* disabled plotViperCharts unit tests as VC seems to be offline currently
* multilabel: improve few task getter functions, especially getTaskFormula is
now correct
new learners:
* regr.glmboost
* cluster.Cobweb
mlr 2.5:
--------------------------------------------------------------------------------
* fixed a bug that caused performance() to return incorrect values with
ResamplePredictions
* we have (somewhat experimental) support for multilabel classification.
so we now have a task, a new baselearner (rFerns),
and a generic reduction-to-binary algorithm (MultilabelWrapper)
* tuning: added 'budget' parameter in makeTuneControl* (single-objective)
and makeTuneMultiCritControl* (multi-objective scenarios), allowing to define
a maximum "number of evaluations" budget for tuning algorithms
* tuning: added 'budget' parameter in makeTuneMultiCritControl*, allowing to
define a maximum "number of evaluations" budget for tuning algorithms
in the single-objective case
* makeTuneControlGenSA: optimized function will be considered non-smooth
per default (change via ... args)
* classif.svm, regr.svm: added 'scale' param
* ksvm: added 'cache' param
* plotFilterValuesGGVIS: sort and n_show are interactive, interactive flag removed
* renamed getProbabilities to getPredictionProbabilities and deprecated
getProbabilities
* plots now use long names for measures where possible
* there was a nasty bug in measure "mcc". fixed and unit tested. and apologies.
* removed getTaskFormulaAsString and improved getTaskFormula so the former is
not needed anymore
* aggregations now have a 'name' property, which is a long name
* generateLearningCurveData and generateThreshVsPerfData now append the
aggregation id to the output column name if the measure ids are the same
* plotLearningCurve, plotLearningCurveGGVIS, plotThreshVsPerf,
plotThreshVsPerfGGVIS now have an argument
'pretty.names' which plots the 'name' element of the measures instead of the 'id'.
* makeCustomResampledMeasure now has arguments 'measure.id' and 'aggregation.id'
instead of only 'id' which corresponded to the measure. Also, 'name' and note (corresponding to the measure)
as well as 'aggregation.name' have been added.
* makeCostMeasure now has arguments 'name' and 'id'.
* classification learner now can have a property 'class.weights', supported by
'class.weights.param'. The latter indicates which of the parameters provides
that class weights information to the learner.
* class weights integrated in the learner will be used as default for 'wcw.param'
in 'makeWeightedClassesWrapper'
* listLearners with create = FALSE does not load packages anymore and is
therefore faster and more reliable; it also supports the additional parameter
check.packages now that will check whether required packages are installed
without loading them
* many new functions for statistical benchmark comparisons are added, see below
* rename hasProperties, getProperties to hasLearnerProperties and
getLearnerProperties
* Learner properties are now implemented object oriented as a state of a Learner.
Only RLearners have the properties stored in a slot.
For each class the getter can be overwritten.
* The hill climbing algorithm for stacking (Caruana 04) is implemented as method
'hill.climb' in 'makeStackedLearner' to select models from base learners, which
is equivalent to weighted average.
* The model compression algorithm for stacking (Caruana 06) is implemented as
method 'compress' in 'makeStackedLearner' to first select models from base
learners and then mimic the behaviour with a super learner. The default super
learner is neural network.
* relativeOverfitting provides a way to estimate how much a model overfits to
the training data according to a measure.
* restructured the LiblineaR learners to a more convenient format. These old ones
were removed:
classif.LiblineaRBinary, classif.LiblineaRLogReg, classif.LiblineaRMultiClass.
For the new ones, see below.
* Added some commonly used ResampleDesc description objects, to save typing in
resample experiments:
hout, cv2, cv3, cv5, cv10.
* regr.randomForest: changed default nodesize to 5 (according to randomForest
defaults)
new functions:
* getDefaultMeasure
* getTaskClassLevels
* getPredictionTruth, getPredictionResponse, getPredictionSE
* convertMLBenchObjToTask
* getBMRLearners, getBMRMeasures, getBMRMeasureIds
* makeMultilabelTask, makeMultilabelWrapper, getMultilabelBinaryPerformances
* generatePartialPredictionData, plotPartialPrediction, and
plotPartialPredictionGGVIS
* getClassWeightParam
* plotBenchmarkResult, convertBMRToRankMatrix, generateRankMatrixAsBarData,
plotRankMatrixAsBar, generateBenchmarkSummaryData, plotBenchmarkSummary,
friedmanTestBMR, friedmanPostHocTestBMR, generateCritDifferencesData,
plotCritDifferences
* getCaretParamSet
* generateCalibrationData and plotCalibration
* relativeOverfitting
* plotROCCurves
new measures:
* hamloss
new learners:
* multilabel.rFerns
* classif.avNNet
* classif.neuralnet
* regr.avNNet
* classif.clusterSVM
* classif.dcSVM
* classif.gaterSVM
* classif.mlp
* classif.saeDNN
* classif.dbnDNN
* classif.nnTrain
* classif.rknn
* regr.rknn
* classif.xgboost
* regr.xgboost
* classif.rotationForest
* classif.LiblineaRL1L2SVC
* classif.LiblineaRL1LogReg
* classif.LiblineaRL2L1SVC
* classif.LiblineaRL2LogReg
* classif.LiblineaRL1LMultiClassSVC
* regr.LiblineaRL2L1SVR
* regr.LiblineaRL2L2SVR
* classif.ranger
* regr.ranger
* surv.ranger
new filters:
* permutation.importance
removed functions:
* setProperties, addProperties, removeProperties
mlr 2.4:
--------------------------------------------------------------------------------
* WrappedModel printer was slightly improved
* ReampleResult now stores the runtime it took to resample in a slot
* getTaskFormula / getTaskFormulaAsString have new argument 'explicit.features'
* getTaskData now has recodeY = "drop.levels" which drops empty factor levels
* option fix.factors in makeLearner was renamed to fix.factors.prediction for
clarity
* showHyperPars was removed. getParamSet does exactly the same thing
* 'resample' and 'benchmark' got the argument keep.pred,
setting it to FALSE allows to discard the prediction objects to save memory
* we had to slightly change how the mem usage is reported in tuning and feature
selection
See TuneControl and FeatSelControl where it is documented what is done now.
* tuneIrace: allows to set the precision / digits within irace (using the argument
'digits' in makeTuneControlIrace); default is maximum precision
* for plotting in general we try to introduce a "data layer", so the data can be
generated independently of the plotting first, into well-defined objects;
these can then be plotted with mlr or custom code;
the naming scheme is always generate<Foo>Data and plot<Foo>
* getFilterValues is deprecated in favor of generateFilterValuesData
* plotFilterValues can now plot multiple filter methods using facetting
* plotROCRCurves has been rewritten to use ggplot2
* classif.ada: added "loss" hyperpar
* add missings properties to all ctree and cforest methods:
regr/classif for ctree, regr/classif/surv for cforest, and regr/classif for blackboost
* learner xgboost was removed, because the package is not on CRAN anymore,
unfortunately
* reg.km: added param 'iso'
* classif.mda: added param 'start.method' and changed its default to 'lvq', added
params 'sub.df', 'tot.df' and 'criterion'
* classif.randomForest: 'sampsize' can now be an int vector (instead of a scalar)
* plotThreshVsPerf and plotLearningCurve now have param 'facet'
new functions:
* getTaskSize
* getNestedTuneResultsX, getNestedTuneResultsOptPathDf
* tuneDesign
* generateROCRCurvesData, generateFilterValuesData, generateLearningCurveData,
plotLearningCurve, generateThreshVsPerfData, plotThreshVsPerf,
* generateThreshVsPerfData accepts Prediction, ResampleResult, lists of
ResampleResult, and BenchmarkResult objects.
* experimental ggvis functions: plotROCRCurvesGGVIS, plotLearningCurveGGVIS,
plotTuneMultiCritResultGGVIS, plotThreshVsPerfGGVIS, and plotFilterValuesGGVIS
new learners::
* classif.bst
* classif.hdrda
* classif.nodeHarvest
* classif.pamr
* classif.rFerns
* classif.sparseLDA
* regr.bst
* regr.frbs
* regr.nodeHarvest
* regr.slim
new measures::
* brier
mlr 2.3:
--------------------------------------------------------------------------------
* resample now returns an object of class ResampleResult (downward compatible)
to allow for a print method.
* resampling on features now supported for an arbitrary number of factor features
* mlr supports ViperCharts plots now
* ROC plot via ROCR can now be created automatically, before you had to call
asROCRPrediction,
then construct the plots via ROCR your self. See plotROCRCurves
* all mlr measures now have slots "name" and "note"
* exported a few very simple "getters" for tasks, see below
* in makeLearner a probability predict.threshold can be set for classifiers, also
see setPredictThreshold
* in the control objects for tuning and feature selection, the user can now enable
threshold tuning
* in the control objects for tuning and feature selection, the user can now define
his own logging function
* default console logging for tuneParams and selectFeatures is more informative,
it displays time and memory info
* updated some properties of some learners
* Default arguments of classif.bartMachine, classif.randomForestSRC,
regr.randomForestSRC and sur.randomForestSRC
have been changed to allow missing data support with default settings.
* externalized measure functions to be used on vectors.
* some minor bug fixes
* required basic learner packages are not loaded into the global namespace
anymore, requireNamespace
is used internally instead. this ensures less name clashes and name shadowing
* resample passes dot arguments to the learner hyperpars
* new option "on.par.out.of.bounds" to disable out-of-bound checks for model
parameters
* measures were slightly internally changed. they expose more properties (check
?Measure) and some now unnecessary object slots were removed
* classif.lda and classif.qda now have hyperpar "predict.method"
* filterFeatures and makeFilterWrapper gain an argument for mandatory features
* plotLearnerPrediction has new option "err.size"
* classif.plsDA and cluster.DBscan for now removed because of problems with the
underlying learning algorithm
* new aggregation test.join
* the following models now can handle factors and ordereds by extra dummy or int
encoding:
classif.glmnet, regr.glmnet, surv.glmnet, surv.cvglmnet, surv.penalized,
surv.optimCoxBoostPenalty, surv.glmboost, surv.CoxBoost
new functions:
* getTaskType, getTaskId, getTaskTargetNames
* plotROCRCurves
* plotViperCharts
* measureSSE, measureMSE, measureRMSE, measureMEDSE, ...
* PreprocWrapperCaret
* setPredictThreshold
new learners::
* classif.bdk
* classif.binomial
* classif.extraTrees
* classif.probit
* classif.xgboost
* classif.xyf
* regr.bartMachine
* regr.bcart
* regr.bdk
* regr.bgp
* regr.bgpllm
* regr.blm
* regr.brnn
* regr.btgp
* regr.btgpllm
* regr.btlm
* regr.cubist
* regr.elmNN
* regr.extraTrees
* regr.laGP
* regr.xgboost
* regr.xyf
* surv.rpart
mlr 2.2:
--------------------------------------------------------------------------------
* The web tutorial was MUCH improved!
* more example tasks and data sets
* Learners and tasks now support ordered factors as features.
The task description knows whether ordered factors are present and it is checked
whether the learner supports such a feature. We have set this property 'ordered'
very conservatively, so very few learners have it, where we are sure ordered
inputs are handled correctly during training.
If you know of more models that support this, please inform us.
* basic R learners now have new slots: name (a descriptive name of the algorithm),
short.name (abbreviation that can be used in plots and tables) and note
(notes regarding slight changes for the mlr integration of the learner and such).
* makeLearner now supports some options regarding learner error handling and
output which could before only be set globally via configureMlr
* Additional arguments for imputation functions to allow a more fine-grain
control of dummy column creation
* imputeMin and imputeMax now subtract or add a multiple of the range of
the data from the minimum or to the maximum, respectively.
* cluster methods now have property 'prob' when they support fuzzy cluster
membership probabilities,
and also then support predict.type = 'prob'. Everything basically works the same
as for posterior probabilities in classif.* methods.
* predict preserves the rownames of the input in its output
* fixed a bug in createDummyFeatures that caused an error when the data contained
missing values.
* plotLearnerPrediction works for clustering and allows greyscale plots (for
printing or articles)
* the whole object-oriented structure behind feature filtering was much
improved. Smaller changes in the signature of makeFilterWrapper and
filterFeatures have become necessary.
* fixed a bug in filter methods of the FSelector package that caused an error when
variable names contained accented letters
* filterFeatures can now be also applied to the result of getFilterValues
* We dropped the data.frame version of some preprocessing operations like
mergeFactorLevelsBySize,
joinClassLevels and removeConstantFeatures for consistency. These now always require tasks as input.
* We support a pretty generic framework for stacking / super-learning now, see
makeStackedLearner
* imbalancy correction + smote:
** fix a bug in "smote" when only factor features are present
** change to oversampling: sample new observations only (with replacement)
** extension to smote algorithm (sampling): minority class observations in
binary classification
are either chosen via sampling or alternatively, each minority class observation
is used an equal number of times
* made the getters for BenchmarkResult more consistent. These are now:
getBMRTaskIds, getBMRLearnerIds, getBMRPredictions, getBMRPerformances,
getBMRAggrPerformances
getBMRTuneResults, getFeatSelResults, getBMRFilteredFeatures
The following methods do not work for BenchmarkResult anymore: getTuneResult, getFeatSelResult
* Removed getFilterResult, because it does the same as getFilteredFeatures
new learners::
* classif.bartMachine
* classif.lqa
* classif.randomForestSRC
* classif.sda
* regr.ctree
* regr.plsr
* regr.randomForestSRC
* cluster.cmeans
* cluster.DBScan
* cluster.kmeans
* cluster.FarthestFirst
* surv.cvglmnet
* surv.optimCoxBoostPenalty
new filters::
* variance
* univariate
* carscore
* rf.importance, rf.min.depth
* anova.test, kruskal.test
* mrmr
new functions:
* makeMulticlassWrapper
* makeStackedLearner, getStackedBaseLearnerPredictions
* joinClassLevels
* summarizeColumns, summarizeLevels
* capLargeValues, mergeFactorLevelsBySize
mlr 2.1:
--------------------------------------------------------------------------------
* mlr now supports multi-criteria tuning
* mlr now supports cluster analysis (experimental)
* improve makeWeightedClassesWrapper: Hyperparams for class weighting are now
supported, too.
* removed fix.factors option from randomForest, but added it in general to
makeLearner, so it now works for all learners.
Helps when feature factor levels where dropped in newdata prediction data.frames
* more consistent results for tuning algorithms and parameters with "trafos" :
we always return the optimal settings on the transformed scale, but in the opt.path in the original scale.
* fix a bug when feature filtering resulted in a NoFeatureModel
* resample now returns a data.frame "err.mgs" or error messages that might have
occurred during resampling
* stratified resampling for survival
new learners::
* classif.cforest
* classif.glmnet
* classif.plsdaCaret
* regr.cforest
* regr.glmnet
* regr.svm
* surv.cforest
* cluster.SimpleKMeans
* cluster.EM
* cluster.XMeans
new measures:
* bac
* db, dunn, g1, g2, silhouette
new functions:
* makeClusterTask
* removeHyperPars
* tuneParamsMultiCrit
* makeTuneMultiCritControlGrid, makeTuneMultiCritControlRandom,
makeTuneMultiCritControlNSGA2
* plotTuneMultiCritResult
* getFailureModelMsg
mlr 2.0:
--------------------------------------------------------------------------------
* mlr now supports survival analysis models (experimental)
* mlr now supports cost-sensitive learning with example-specific costs
experimental)
* Some example tasks and data sets were added for simple access
* added FeatSelWrapper and getFeatSelResult
* performance functions now allows to compute multiple measures
* added multiclass.roc performance measure
* observation weights can now also be specified in the task
* added option on.learner.warning to configureMlr to suppress warnings in learners
* fixed a bug in stratified CV where elements where not distributed as evenly as
possible when the split number did not divide the number of observation
* added class.weights param for classif.svm
* add fix.factors.prediction option to randomForest
* generic standard error estimation in randomForest and BaggingWrapper
* added fixup.data option to task constructors, so basic data cleanup can be
performed
* show.info is now an option in configureMlr
* learners now support taggable properties that can be queried and changed. also
see below.
* listLearners(forTask) was unified
* removed tuning via R' optim method (makeTuneControlOptim), as the optimizers in
there really make no sense for tuning
* Grid search was improved so one does not have to discretize parameters manually
anymore (although this is still possible). Instead one now passes a 'resolution' argument. Internally we
now use ParamHelpers::generateGridDesign for this.
* toy tasks were added for convenient usage: iris.task, sonar.task, bh.task
they also also have corresponding resampling instances, so you directly start
working, e.g., iris.rin
new learners::
* classif.knn
* classif.IBk
* classif.LiblineaRBinary
* classif.LiblineaRLogReg
* classif.LiblineaRMultiClass
* classif.linDA
* classif.plr
* classif.plsDA
* classif.rrlda
* regr.crs
* regr.IBk
* regr.mob
* surv.CoxBoost
* surv.coxph
* surv.glmboost
* surv.glmnet
* surv.penalized
* surv.randomForestSRC
new measures:
* multiauc
* cindex
* meancosts, mcp
new functions:
* removeConstantFeatures, normalizeFeatures, dropFeatures, createDummyFeatures
* getTaskNFeats
* hasProperties, getProperties, setProperties, addProperties, removeProperties
* showHyperPars
* setId
* listMeasures
* isFailureModel
* plotLearnerPrediction
* plotThreshVsPerf
* holdout, subsample, crossval, repcv, bootstrapOOB, bootstrapB632,
bootstrapB632plus
* listFilterMethods, getFilterValues, filterFeatures, makeFilterWrapper,
plotFilterValues
* benchmark
* getPerformances, getAggrPerformances, getPredictions, getFilterResult,
getTuneResult, getFeatSelResult
* oversample, undersample, makeOversampleWrapper, makeUndersampleWrapper
* smote, makeSmoteWrapper
* downsample, makeDownsampleWrapper
* makeWeightedClassesWrapper
* makeTuneControlGenSA
* makeModelMultiplexer, makeModelMultiplexerParamSet
* makeCostSensTask, makeCostSensClassifWrapper, makeCostSensRegrWrapper,
makeCostsSensWeightedPairsLearner
* makeSurvTask
* impute, reimpute, makeImputeWrapper, lots of impute<Method>, makeImputeMethod
mlr 1.1-18:
--------------------------------------------------------------------------------
* Initial release to CRAN