Skip to content

Commit d82a1a8

Browse files
committed
1 parent df90ce6 commit d82a1a8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+35473
-0
lines changed

Diff for: .nojekyll

Whitespace-only changes.

Diff for: master/.buildinfo

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: 95645162aad93a65d770e04aecce6f55
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7

Diff for: master/.doctrees/cleanlab.doctree

714 KB
Binary file not shown.

Diff for: master/.doctrees/cleanlab.models.doctree

749 KB
Binary file not shown.

Diff for: master/.doctrees/cleanlab.models/cifar_cnn.doctree

344 KB
Binary file not shown.

Diff for: master/.doctrees/cleanlab.models/index.doctree

2.48 KB
Binary file not shown.
405 KB
Binary file not shown.

Diff for: master/.doctrees/environment.pickle

285 KB
Binary file not shown.

Diff for: master/.doctrees/index.doctree

15.9 KB
Binary file not shown.

Diff for: master/_modules/cleanlab/baseline_methods.html

+466
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/classification.html

+817
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/coteaching.html

+593
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/latent_algebra.html

+676
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/latent_estimation.html

+1,359
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/models/cifar_cnn.html

+470
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/models/mnist_pytorch.html

+727
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/noise_generation.html

+867
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/polyplex.html

+462
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/pruning.html

+960
Large diffs are not rendered by default.

Diff for: master/_modules/cleanlab/util.html

+835
Large diffs are not rendered by default.

Diff for: master/_modules/index.html

+368
Large diffs are not rendered by default.

Diff for: master/_sources/cleanlab.models.rst

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
:hide-toc:
2+
3+
CIFAR CNN
4+
=========
5+
6+
.. automodule:: cleanlab.models.cifar_cnn
7+
:autosummary:
8+
:members:
9+
:undoc-members:
10+
:show-inheritance:
11+
12+
MNIST PyTorch
13+
=============
14+
15+
.. automodule:: cleanlab.models.mnist_pytorch
16+
:autosummary:
17+
:members:
18+
:undoc-members:
19+
:show-inheritance:

Diff for: master/_sources/cleanlab.models/cifar_cnn.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
CIFAR CNN
2+
=========
3+
4+
.. automodule:: cleanlab.models.cifar_cnn
5+
:autosummary:
6+
:members:
7+
:undoc-members:
8+
:show-inheritance:

Diff for: master/_sources/cleanlab.models/index.rst

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. toctree::
2+
3+
cifar_cnn
4+
mnist_pytorch

Diff for: master/_sources/cleanlab.models/mnist_pytorch.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
MNIST PyTorch
2+
=============
3+
4+
.. automodule:: cleanlab.models.mnist_pytorch
5+
:autosummary:
6+
:members:
7+
:undoc-members:
8+
:show-inheritance:

Diff for: master/_sources/cleanlab.rst

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
:hide-toc:
2+
3+
Classification
4+
==============
5+
6+
.. automodule:: cleanlab.classification
7+
:autosummary:
8+
:members:
9+
:undoc-members:
10+
:show-inheritance:
11+
12+
Latent Estimation
13+
=================
14+
15+
.. automodule:: cleanlab.latent_estimation
16+
:autosummary:
17+
:members:
18+
:undoc-members:
19+
:show-inheritance:
20+
21+
Noise Generation
22+
================
23+
24+
.. automodule:: cleanlab.noise_generation
25+
:autosummary:
26+
:members:
27+
:undoc-members:
28+
:show-inheritance:
29+
30+
Baseline Methods
31+
================
32+
33+
.. automodule:: cleanlab.baseline_methods
34+
:autosummary:
35+
:members:
36+
:undoc-members:
37+
:show-inheritance:
38+
39+
Co-Teaching
40+
===========
41+
42+
.. automodule:: cleanlab.coteaching
43+
:autosummary:
44+
:members:
45+
:undoc-members:
46+
:show-inheritance:
47+
48+
Latent Algebra
49+
==============
50+
51+
.. automodule:: cleanlab.latent_algebra
52+
:autosummary:
53+
:members:
54+
:undoc-members:
55+
:show-inheritance:
56+
57+
Pruning
58+
=======
59+
60+
.. automodule:: cleanlab.pruning
61+
:autosummary:
62+
:members:
63+
:undoc-members:
64+
:show-inheritance:
65+
66+
Utilities
67+
=========
68+
69+
.. automodule:: cleanlab.util
70+
:autosummary:
71+
:members:
72+
:undoc-members:
73+
:show-inheritance:
74+
75+
Polyplex
76+
========
77+
78+
.. automodule:: cleanlab.polyplex
79+
:autosummary:
80+
:members:
81+
:undoc-members:
82+
:show-inheritance:
83+
84+
Models
85+
======
86+
87+
.. toctree::
88+
89+
cleanlab.models
90+

Diff for: master/_sources/index.rst

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
Introduction
2+
============
3+
4+
**cleanlab automatically finds and fixes errors in your ML datasets.**
5+
6+
| This reduces manual work needed to fix data issues and helps train reliable ML models on partially mislabeled datasets. ``cleanlab`` has already found thousands of `label errors <https://labelerrors.com>`_ in ImageNet, MNIST, and other popular ML benchmarking datasets, so let's get started with yours!
7+
8+
Quickstart
9+
==========
10+
11+
1. Install ``cleanlab``.
12+
------------------------
13+
14+
.. tabs::
15+
16+
.. tab:: pip
17+
18+
.. code-block:: python
19+
20+
pip install cleanlab
21+
22+
.. tab:: conda
23+
24+
.. code-block:: python
25+
26+
conda install -c conda-forge cleanlab
27+
28+
.. tab:: source
29+
30+
.. code-block:: python
31+
32+
pip install git+https://github.com/cleanlab/cleanlab.git
33+
34+
35+
2. Find label errors with ``get_noise_indices``.
36+
------------------------------------------------
37+
38+
``cleanlab``'s ``get_noise_indices`` function tells you which examples in your dataset are likely mislabeled. At a minimum, it expects two inputs - your data's given labels, ``y``, and predicted probabilities, ``pyx``, from some trained model (Note: these must be out-of-sample predictions where the data points were held out from the model during training, which can be obtained via cross-validation).
39+
40+
Setting ``sorted_index_method`` instructs ``cleanlab`` to return the indices of potential mislabeled examples, ordered by the likelihood of label error estimate via ``prob_given_label`` scores (predicted probability of given label according to the model).
41+
42+
.. code-block:: python
43+
44+
from cleanlab.pruning import get_noise_indices
45+
46+
ordered_label_errors = get_noise_indices(
47+
s=y,
48+
psx=pyx,
49+
sorted_index_method='prob_given_label')
50+
51+
.. important::
52+
The predicted probabilities, ``pyx``, from your model **must be out-of-sample**! You should never provide predictions on the same data points used to train the model - this would reflect predictions of an overfitted model, making it unsuitable for finding label errors. To compute the out-of-sample predicted probabilities of the entire dataset, you can use cross-validation.
53+
54+
..
55+
todo - include the url for tf and torch beginner tutorials
56+
57+
3. Train robust models with noisy labels using ``LearningWithNoisyLabels``.
58+
---------------------------------------------------------------------------
59+
60+
``cleanlab``'s ``LearningWithNoisyLabels`` adapts any classification model, ``clf``, to a more reliable one by allowing it to train directly on partially mislabeled datasets.
61+
62+
When the ``.fit()`` method is called, it automatically identifies and removes any examples that are deemed "noisy" in the provided dataset before returning a final trained model.
63+
64+
.. code-block:: python
65+
66+
from sklearn.linear_model import LogisticRegression
67+
from cleanlab.classification import LearningWithNoisyLabels
68+
69+
clf = LogisticRegression() # Here we've used sklearn's Logistic Regression model, but this can be any classifier that implements sklearn's API.
70+
lnl = LearningWithNoisyLabels(clf=clf)
71+
lnl.fit(X=X, s=y)
72+
73+
.. toctree::
74+
:hidden:
75+
:caption: Get Started
76+
77+
Quickstart <self>
78+
79+
.. toctree::
80+
:caption: API Reference
81+
:hidden:
82+
83+
cleanlab

0 commit comments

Comments
 (0)