Skip to content

Commit 87a85c1

Browse files
committed
Merge pull request scikit-learn#417 from larsmans/multilabel
MRG : ENH multilabel learning in OneVsRestClassifier
2 parents 97bf1ad + 6c6d9e3 commit 87a85c1

16 files changed

+1075
-512
lines changed

doc/datasets/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ can be used to build artifical datasets of controled size and complexity.
116116
:template: function.rst
117117

118118
make_classification
119+
make_multilabel_classification
119120
make_regression
120121
make_blobs
121122
make_friedman1

doc/modules/classes.rst

+34
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ Samples generator
145145
:template: function.rst
146146

147147
datasets.make_classification
148+
datasets.make_multilabel_classification
148149
datasets.make_regression
149150
datasets.make_blobs
150151
datasets.make_friedman1
@@ -640,6 +641,39 @@ Pairwise metrics
640641
mixture.VBGMM
641642

642643

644+
.. _multiclass_ref:
645+
646+
:mod:`sklearn.multiclass`: Multiclass and multilabel classification
647+
===================================================================
648+
649+
.. automodule:: sklearn.multiclass
650+
:no-members:
651+
:no-inherited-members:
652+
653+
**User guide:** See the :ref:`multiclass` section for further details.
654+
655+
.. currentmodule:: sklearn
656+
657+
.. autosummary::
658+
:toctree: generated
659+
:template: class.rst
660+
661+
multiclass.OneVsRestClassifier
662+
multiclass.OneVsOneClassifier
663+
multiclass.OutputCodeClassifier
664+
665+
.. autosummary::
666+
:toctree: generated
667+
:template: function.rst
668+
669+
multiclass.fit_ovr
670+
multiclass.predict_ovr
671+
multiclass.fit_ovo
672+
multiclass.predict_ovo
673+
multiclass.fit_ecoc
674+
multiclass.predict_ecoc
675+
676+
643677
.. _naive_bayes_ref:
644678

645679
:mod:`sklearn.naive_bayes`: Naive Bayes

doc/modules/multiclass.rst

+37-7
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,23 @@
11

22
.. _multiclass:
33

4-
=====================
5-
Multiclass algorithms
6-
=====================
4+
====================================
5+
Multiclass and multilabel algorithms
6+
====================================
77

88
.. currentmodule:: sklearn.multiclass
99

10-
This module implements multiclass learning algorithms:
10+
This module implements multiclass and multilabel learning algorithms:
1111
- one-vs-the-rest / one-vs-all
1212
- one-vs-one
1313
- error correcting output codes
1414

15+
Multiclass classification means classification with more than two classes.
16+
Multilabel classification is a different task, where a classifier is used to
17+
predict a set of target labels for each instance; i.e., the set of target
18+
classes is not assumed to be disjoint as in ordinary (binary or multiclass)
19+
classification. This is also called any-of classification.
20+
1521
The estimators provided in this module are meta-estimators: they require a base
1622
estimator to be provided in their constructor. For example, it is possible to
1723
use these estimators to turn a binary classifier or a regressor into a
@@ -26,9 +32,15 @@ improves.
2632
multiclass classification out-of-the-box. Below is a summary of the
2733
classifiers supported in scikit-learn grouped by the strategy used.
2834

29-
- Inherently multiclass: Naive Bayes, LDA.
30-
- One-Vs-One: SVC.
31-
- One-Vs-All: LinearSVC, LogisticRegression, SGDClassifier, RidgeClassifier.
35+
- Inherently multiclass: Naive Bayes, :class:`LDA`.
36+
- One-Vs-One: :class:`SVC`.
37+
- One-Vs-All: :class:`LinearSVC`, :class:`LogisticRegression`,
38+
:class:`SGDClassifier`, :class:`RidgeClassifier`.
39+
40+
.. note::
41+
42+
At the moment there are no evaluation metrics implemented for multilabel
43+
learnings.
3244

3345

3446
One-Vs-The-Rest
@@ -57,6 +69,24 @@ fair default choice. Below is an example::
5769
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2,
5870
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
5971

72+
Multilabel learning with OvR
73+
----------------------------
74+
75+
``OneVsRestClassifier`` also supports multilabel classification.
76+
To use this feature, feed the classifier a list of tuples containing
77+
target labels, like in the example below.
78+
79+
80+
.. figure:: ../auto_examples/images/plot_multilabel_1.png
81+
:target: ../auto_examples/plot_multilabel.html
82+
:align: center
83+
:scale: 75%
84+
85+
86+
.. topic:: Examples:
87+
88+
* :ref:`example_plot_multilabel.py`
89+
6090

6191
One-Vs-One
6292
==========

examples/plot_multilabel.py

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
"""
2+
=========================
3+
Multilabel classification
4+
=========================
5+
6+
This example simulates a multi-label document classification problem. The
7+
dataset is generated randomly based on the following process:
8+
9+
- pick the number of labels: n ~ Poisson(n_labels)
10+
- n times, choose a class c: c ~ Multinomial(theta)
11+
- pick the document length: k ~ Poisson(length)
12+
- k times, choose a word: w ~ Multinomial(theta_c)
13+
14+
In the above process, rejection sampling is used to make sure that
15+
n is never zero or more than 2, and that the document length
16+
is never zero. Likewise, we reject classes which have already been chosen.
17+
The documents that are assigned to both classes are plotted surrounded by
18+
two colored circles.
19+
20+
The classification is performed by projecting to the first two principal
21+
components for visualisation purposes, followed by using the
22+
:class:`sklearn.multiclass.OneVsRestClassifier` metaclassifier using two SVCs
23+
with linear kernels to learn a discriminative model for each class.
24+
"""
25+
print __doc__
26+
27+
import numpy as np
28+
import matplotlib.pylab as pl
29+
30+
from sklearn.datasets import make_multilabel_classification
31+
from sklearn.multiclass import OneVsRestClassifier
32+
from sklearn.svm import SVC
33+
from sklearn.decomposition import PCA
34+
35+
36+
def plot_hyperplane(clf, min_x, max_x, linestyle, label):
37+
# get the separating hyperplane
38+
w = clf.coef_[0]
39+
a = -w[0] / w[1]
40+
xx = np.linspace(min_x, max_x)
41+
yy = a * xx - (clf.intercept_[0]) / w[1]
42+
pl.plot(xx, yy, linestyle, label=label)
43+
44+
45+
X, Y = make_multilabel_classification(n_classes=2, n_labels=1, random_state=42)
46+
X = PCA(n_components=2).fit_transform(X)
47+
min_x = np.min(X[:, 0])
48+
max_x = np.max(X[:, 0])
49+
50+
classif = OneVsRestClassifier(SVC(kernel='linear'))
51+
classif.fit(X, Y)
52+
53+
pl.figure()
54+
pl.title('Multilabel classification example')
55+
pl.xlabel('First principal component')
56+
pl.ylabel('Second principal component')
57+
58+
zero_class = np.where([0 in y for y in Y])
59+
one_class = np.where([1 in y for y in Y])
60+
pl.scatter(X[:, 0], X[:, 1], s=40, c='gray')
61+
pl.scatter(X[zero_class, 0], X[zero_class, 1], s=160, edgecolors='b',
62+
facecolors='none', linewidths=2, label='Class 1')
63+
pl.scatter(X[one_class, 0], X[one_class, 1], s=80, edgecolors='orange',
64+
facecolors='none', linewidths=2, label='Class 2')
65+
pl.axis('tight')
66+
67+
plot_hyperplane(classif.estimators_[0], min_x, max_x, 'k--',
68+
'Boundary\nfor class 1')
69+
plot_hyperplane(classif.estimators_[1], min_x, max_x, 'k-.',
70+
'Boundary\nfor class 2')
71+
pl.xticks(())
72+
pl.yticks(())
73+
pl.legend()
74+
75+
pl.show()

sklearn/datasets/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from .twenty_newsgroups import load_20newsgroups
2525
from .mldata import fetch_mldata, mldata_filename
2626
from .samples_generator import make_classification
27+
from .samples_generator import make_multilabel_classification
2728
from .samples_generator import make_regression
2829
from .samples_generator import make_blobs
2930
from .samples_generator import make_friedman1

0 commit comments

Comments
 (0)