-
Notifications
You must be signed in to change notification settings - Fork 12
/
NEWS
90 lines (67 loc) · 3.87 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
activelearning 0.1.2
-------------
UPDATES
* `query_bagging()`: The example was breaking due to a misspecified `predict_f`
function.
activelearning 0.1.1
-------------
UPDATES
* `query_bagging()`: To shorten code, `query_by_bagging()` is now
`query_bagging()`. Although `query_by_bagging()` still exists, it will be
deprecated in a future release.
* `query_committee()`: To shorten code, `query_by_committee()` is now
`query_committee()`. Although `query_by_committee()` still exists, it will be
deprecated in a future release.
* `query_random()`: For uniformity, `random_query()` is now `query_random()`.
Although `random_query()` still exists, it will be deprecated in a future
release.
* `activelearning()`: Deprecated and then removed this method.
* `activelearning_sim()`: Deprecated and then removed this method.
activelearning 0.1
-------------
NEW FEATURES
* First version of the `activelearning` package. With this package, we aim to
provide tools for the machine-learning paradigm, active learning. Specifically,
this package is a collection of various active-learning methods from the
literature for choosing unlabeled observations in a data set to query for their
true labels. The framework is particularly useful when there are very few
labeled observations relative to a large number of unlabeled observations, and
the user seeks to determine as few true labels as possible to achieve highly
accurate classifiers.
ACTIVE LEARNING METHODS
* `activelearning()` is a wrapper function that applies our implementation of one
of several active-learning methods to select observations that should be
labeled to improve classification performance based on the current data set.
* `uncertainty_sampling()` determines the least certain unlabeled observations,
which are selected to query for their true classification labels. The user must
specify a classifier from the 'caret' R package to train a classifier from the
observations that are already labeled. The trained classifier is then applied
to the unlabeled observations to determine the least certain ones. We provide
several methods to estimate the certainty of each observation. By default, the
entropy of the posterior probabilities of classification is used.
* `query_by_bagging()` is an active-learning method that applies a classifier
from the 'caret' package to a data set using the bagging paradigm by training
a specified classifier using 'C' randomly sampled data sets. The unlabeled
observations that have the maximum disagreement among the 'C' trained
classifiers are to be queried. We provide several measures of disagreement.
By default, the Kullback-Leibler divergence between the label distributions
of any one committee member and the consensus of the 'C' classifiers is used.
* `query_by_committee()` is an active-learning method similar to
`query_by_bagging()`. Rather than applying a bagging method using the same
classifier to 'C' randomly sampled data sets, a committee of 'C' different
classification methods is used.
* `random_query()` is a function to query unlabeled observations at random.
Random querying is often used as a baseline to which proposed active-learning
methods are compared.
ACTIVE LEARNING SIMULATION
* `activelearning_sim()` is a function that simulates the classification
performance of a specified active-learning method for a given classifier (or
classifiers) using a given data set. The simulation function is useful to
investigate empirical classification performance of various active-learning
methods as well as the effect of various training sample sizes.
MISCELLANEOUS
* `validate_classifier()` is a helper function that validates a classifier
specified from the 'caret' package.
* `query_oracle()` is a helper function that reports the true classification for
a set of unlabeled observations, if known. This is useful for simulation
studies.