diff --git a/README.md b/README.md
deleted file mode 100644
index ed79b0043..000000000
--- a/README.md
+++ /dev/null
@@ -1,67 +0,0 @@
-# scikit-matter
-
-[![Test](https://github.com/lab-cosmo/scikit-matter/workflows/Test/badge.svg)](https://github.com/lab-cosmo/scikit-matter/actions?query=workflow%3ATest)
-[![codecov](https://codecov.io/gh/lab-cosmo/scikit-matter/branch/main/graph/badge.svg?token=UZJPJG34SM)](https://codecov.io/gh/lab-cosmo/scikit-matter/)
-[![pypi](https://img.shields.io/pypi/v/skmatter.svg)](https://pypi.org/project/skmatter)
-[![conda](https://anaconda.org/conda-forge/skmatter/badges/version.svg)](https://anaconda.org/conda-forge/skmatter)
-[![documentation](https://img.shields.io/badge/documentation-latest-sucess)](https://scikit-matter.readthedocs.io)
-
-A collection of scikit-learn compatible utilities that implement methods
-born out of the materials science and chemistry communities.
-
-## Installation
-
-You can install *scikit-matter* either via pip using
-
-```bash
-pip install skmatter
-```
-
-or conda
-
-```bash
-conda install -c conda-forge skmatter
-```
-
-You can then `import skmatter` in your code!
-
-## Developing the package
-
-Start by installing the development dependencies:
-
-```bash
-pip install tox black flake8
-```
-
-Then this package itself
-
-```bash
-git clone https://github.com/lab-cosmo/scikit-matter
-cd scikit-matter
-pip install -e .
-```
-
-This install the package in development mode, making is `import`able globally
-and allowing you to edit the code and directly use the updated version.
-
-### Running the tests
-
-```bash
-cd <scikit-matter PATH>
-# run unit tests
-tox
-# run the code formatter
-black --check .
-# run the linter
-flake8
-```
-
-You may want to setup your editor to automatically apply the
-[black](https://black.readthedocs.io/en/stable/) code formatter when saving your
-files, there are plugins to do this with [all major
-editors](https://black.readthedocs.io/en/stable/editor_integration.html).
-
-## License and developers
-
-This project is distributed under the BSD-3-Clauses license. By contributing to
-it you agree to distribute your changes under the same license.
diff --git a/README.rst b/README.rst
new file mode 100644
index 000000000..342f5d00e
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,94 @@
+scikit-matter
+=============
+
+|tests| |codecov| |docs| |pypi| |conda| |docs|
+
+A collection of scikit-learn compatible utilities that implement methods born out of the
+materials science and chemistry communities.
+
+Installation
+------------
+
+You can install *scikit-matter* either via pip using
+
+.. code-block:: bash
+
+    pip install skmatter
+
+
+or conda
+
+.. code-block:: bash
+
+    conda install -c conda-forge skmatter
+
+
+You can then `import skmatter` in your code!
+
+Developing the package
+----------------------
+
+Start by installing the development dependencies:
+
+.. code-block:: bash
+
+    pip install tox black flake8
+
+
+Then this package itself
+
+.. code-block:: bash
+
+    git clone https://github.com/lab-cosmo/scikit-matter
+    cd scikit-matter
+    pip install -e .
+
+
+This install the package in development mode, making is ``import`` able globally and
+allowing you to edit the code and directly use the updated version.
+
+Running the tests
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+
+    cd <scikit-matter PATH>
+    # run unit tests
+    tox
+    # run the code formatter
+    black --check .
+    # run the linter
+    flake8
+
+
+You may want to setup your editor to automatically apply the `black`_ code formatter
+when saving your files, there are plugins to do this with `all major editors`_.
+
+License and developers
+----------------------
+
+This project is distributed under the BSD-3-Clauses license. By contributing to it you
+agree to distribute your changes under the same license.
+
+.. _`black`: https://black.readthedocs.io/en/stable/
+.. _`all major editors`: https://black.readthedocs.io/en/stable/editor_integration.html
+
+.. |tests| image:: https://github.com/lab-cosmo/scikit-matter/workflows/Test/badge.svg
+   :alt: Github Actions Tests Job Status
+   :target: https://github.com/lab-cosmo/scikit-matter/actions?query=workflow%3ATests
+
+.. |codecov| image:: https://codecov.io/gh/lab-cosmo/scikit-matter/branch/main/graph/badge.svg?token=UZJPJG34SM
+   :alt: Code coverage
+   :target: https://codecov.io/gh/lab-cosmo/scikit-matter/
+
+.. |pypi| image:: https://img.shields.io/pypi/v/skmatter.svg
+   :alt: Latest PYPI version
+   :target: https://pypi.org/project/skmatter
+
+.. |conda| image:: https://anaconda.org/conda-forge/skmatter/badges/version.svg
+   :alt: Latest conda version
+   :target: https://anaconda.org/conda-forge/skmatter
+
+.. |docs| image:: https://img.shields.io/badge/documentation-latest-sucess
+   :alt: Documentation
+   :target: https://scikit-matter.readthedocs.io
diff --git a/docs/src/bibliography.rst b/docs/src/bibliography.rst
index 4af27a247..428925508 100644
--- a/docs/src/bibliography.rst
+++ b/docs/src/bibliography.rst
@@ -3,42 +3,39 @@ References
 
 .. [deJong1992]
     S. de Jong, H.A.L. Kiers,
-    "Principal covariates regression: Part I. Theory",
-    Chemom. intell. lab. syst. 14 (1992) 155-164
-    https://doi.org/10.1016/0169-7439(92)80100-I
+    "Principal covariates regression: Part I. Theory", Chemom. intell. lab. syst. 14
+    (1992) 155-164 https://doi.org/10.1016/0169-7439(92)80100-I
 
 .. [Imbalzano2018]
-    Giulio Imbalzano,  Andrea Anelli,  Daniele Giofré, Sinja Klees,  Jörg Behler, and  Michele Ceriotti,
-    “Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials.”
-    The Journal of chemical physics 148 24 (2018): 241730.
-    https://aip.scitation.org/doi/10.1063/1.5024611.
+    Giulio Imbalzano, Andrea Anelli, Daniele Giofré,Sinja Klees, Jörg Behler, and
+    Michele Ceriotti, “Automatic selection of atomic fingerprints and reference
+    configurations for machine-learning potentials.” The Journal of chemical physics 148
+    24 (2018): 241730. https://aip.scitation.org/doi/10.1063/1.5024611.
 
 .. [Ceriotti2019]
-    Michele Ceriotti, Lyndon Emsley, Federico Paruzzo, Albert Hofstetter, Félix Musil, Sandip De, Edgar A. Engel, and Andrea Anelli.
-    "Chemical Shifts in Molecular Solids by Machine Learning Datasets",
-    Materials Cloud Archive 2019.0023/v2 (2019),
+    Michele Ceriotti, Lyndon Emsley, Federico Paruzzo, Albert Hofstetter, Félix Musil,
+    Sandip De, Edgar A. Engel, and Andrea Anelli. "Chemical Shifts in Molecular Solids
+    by Machine Learning Datasets", Materials Cloud Archive 2019.0023/v2 (2019),
     https://doi.org/10.24435/materialscloud:2019.0023/v2.
 
 .. [Helfrecht2020]
     Benjamin A Helfrecht, Rose K Cersonsky, Guillaume Fraux, and Michele Ceriotti,
-    "Structure-property maps with Kernel principal covariates regression."
-    2020 Mach. Learn.: Sci. Technol. 1 045021.
+    "Structure-property maps with Kernel principal covariates regression." 2020 Mach.
+    Learn.: Sci. Technol. 1 045021.
     https://iopscience.iop.org/article/10.1088/2632-2153/aba9ef.
 
 .. [Pozdnyakov2020]
-    Pozdnyakov, S. N., Willatt, M. J., Bartók, A. P., Ortner, C., Csányi, G., & Ceriotti, M. (2020).
-    "Incompleteness of Atomic Structure Representations."
-    Physical Review Letters, 125(16).
-    https://doi.org/10.1103/physrevlett.125.166001
+    Pozdnyakov, S. N., Willatt, M. J., Bartók, A. P., Ortner, C., Csányi, G., &
+    Ceriotti, M. (2020). "Incompleteness of Atomic Structure Representations." Physical
+    Review Letters, 125(16). https://doi.org/10.1103/physrevlett.125.166001
 
 .. [Goscinski2021]
-    Alexander Goscinski, Guillaume Fraux, Giulio Imbalzano, and Michele Ceriotti,
-    "The role of feature space in atomistic learning."
-    2021 Mach. Learn.: Sci. Technol. 2 025028.
-    https://iopscience.iop.org/article/10.1088/2632-2153/abdaf7.
+    Alexander Goscinski, Guillaume Fraux, Giulio Imbalzano, and Michele Ceriotti, "The
+    role of feature space in atomistic learning." 2021 Mach. Learn.: Sci. Technol. 2
+    025028. https://iopscience.iop.org/article/10.1088/2632-2153/abdaf7.
 
 .. [Cersonsky2021]
-    Rose K Cersonsky, Benjamin A Helfrecht, Edgar A. Engel, Sergei Kliavinek, and Michele Ceriotti,
-    "Improving Sample and Feature Selection with Principal Covariates Regression"
-    2021 Mach. Learn.: Sci. Technol. 2 035038.
+    Rose K Cersonsky, Benjamin A Helfrecht, Edgar A. Engel, Sergei Kliavinek, and
+    Michele Ceriotti, "Improving Sample and Feature Selection with Principal Covariates
+    Regression" 2021 Mach. Learn.: Sci. Technol. 2 035038.
     https://iopscience.iop.org/article/10.1088/2632-2153/abfe7c.
diff --git a/docs/src/contributing.rst b/docs/src/contributing.rst
index a1d4e5073..ef0716e15 100644
--- a/docs/src/contributing.rst
+++ b/docs/src/contributing.rst
@@ -18,14 +18,14 @@ Then this package itself
   cd scikit-matter
   pip install -e .
 
-This install the package in development mode, making it importable globally
-and allowing you to edit the code and directly use the updated version.
+This install the package in development mode, making it importable globally and allowing
+you to edit the code and directly use the updated version.
 
 Running the tests
 #################
 
-The testsuite is implemented using Python's `unittest`_ framework and should be set-up and
-run in an isolated virtual environment with `tox`_. All tests can be run with
+The testsuite is implemented using Python's `unittest`_ framework and should be set-up
+and run in an isolated virtual environment with `tox`_. All tests can be run with
 
 .. code-block:: bash
 
@@ -40,11 +40,11 @@ If you wish to test only specific functionalities, for example:
   tox -e examples      # test the examples
 
 
-You can also use ``tox -e format`` to use tox to do actual formatting instead
-of just testing it. Also, you may want to setup your editor to automatically apply the
-`black <https://black.readthedocs.io/en/stable/>`_ code formatter when saving your
-files, there are plugins to do this with `all major
-editors <https://black.readthedocs.io/en/stable/editor_integration.html>`_.
+You can also use ``tox -e format`` to use tox to do actual formatting instead of just
+testing it. Also, you may want to setup your editor to automatically apply the `black
+<https://black.readthedocs.io/en/stable/>`_ code formatter when saving your files, there
+are plugins to do this with `all major editors
+<https://black.readthedocs.io/en/stable/editor_integration.html>`_.
 
 .. _unittest: https://docs.python.org/3/library/unittest.html
 .. _tox: https://tox.readthedocs.io/en/latest
@@ -60,9 +60,8 @@ machine as described above. Then, build the documentation with
 
     tox -e docs
 
-You can then visualize the local documentation with your favorite browser using
-the following command (or open the :file:`docs/build/html/index.html` file
-manually).
+You can then visualize the local documentation with your favorite browser using the
+following command (or open the :file:`docs/build/html/index.html` file manually).
 
 .. code-block:: bash
 
@@ -172,8 +171,8 @@ Then, show ``scikit-matter`` how to load your data by adding a loader function t
 
 Add this function to ``src/skmatter/datasets/__init__.py``.
 
-Finally, add a test to ``tests/test_datasets.py`` to see that your dataset
-loads properly. It should look something like this:
+Finally, add a test to ``tests/test_datasets.py`` to see that your dataset loads
+properly. It should look something like this:
 
 .. code-block:: python
 
@@ -190,7 +189,8 @@ loads properly. It should look something like this:
         self.my_data.DESCR
 
 
-You're good to go! Time to submit a `pull request. <https://github.com/lab-cosmo/scikit-matter/pulls>`_
+You're good to go! Time to submit a `pull request.
+<https://github.com/lab-cosmo/scikit-matter/pulls>`_
 
 
 License
diff --git a/docs/src/datasets.rst b/docs/src/datasets.rst
index 162683323..cd3c368fd 100644
--- a/docs/src/datasets.rst
+++ b/docs/src/datasets.rst
@@ -7,5 +7,4 @@ Datasets
 
 .. include:: ../../src/skmatter/datasets/descr/nice_dataset.rst
 
-.. include:: ../../src/skmatter/datasets/descr/who_dataset.rst 
-
+.. include:: ../../src/skmatter/datasets/descr/who_dataset.rst
diff --git a/docs/src/gfrm.rst b/docs/src/gfrm.rst
index 1330d4f8f..a7d1f5a6a 100644
--- a/docs/src/gfrm.rst
+++ b/docs/src/gfrm.rst
@@ -11,12 +11,12 @@ Reconstruction Measures
 Global Reconstruction Error
 ###########################
 
-.. autofunction:: pointwise_global_reconstruction_error 
-.. autofunction:: global_reconstruction_error 
+.. autofunction:: pointwise_global_reconstruction_error
+.. autofunction:: global_reconstruction_error
 
 .. _GRD-api:
 
-Global Reconstruction Distortion 
+Global Reconstruction Distortion
 ################################
 
 .. autofunction:: pointwise_global_reconstruction_distortion
diff --git a/docs/src/index.rst b/docs/src/index.rst
index 106b89a70..186faff2b 100644
--- a/docs/src/index.rst
+++ b/docs/src/index.rst
@@ -1,23 +1,23 @@
 scikit-matter documentation
 ===========================
 
-``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_
-compatible utilities that implement methods born out of the materials science
-and chemistry communities. 
+``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_ compatible
+utilities that implement methods born out of the materials science and chemistry
+communities.
 
-Convenient-to-use libraries such as scikit-learn have accelerated the adoption and application
-of machine learning (ML) workflows and data-driven methods. Such libraries have gained great
-popularity partly because the implemented methods are generally applicable in multiple domains.
-While developments in the atomistic learning community have put forward general-use machine
-learning methods, their deployment is commonly entangled with domain-specific functionalities,
-preventing access to a wider audience.
+Convenient-to-use libraries such as scikit-learn have accelerated the adoption and
+application of machine learning (ML) workflows and data-driven methods. Such libraries
+have gained great popularity partly because the implemented methods are generally
+applicable in multiple domains. While developments in the atomistic learning community
+have put forward general-use machine learning methods, their deployment is commonly
+entangled with domain-specific functionalities, preventing access to a wider audience.
 
 scikit-matter targets domain-agnostic implementations of methods developed in the
-computational chemical and materials science community, following the
-scikit-learn API and coding guidelines to promote usability and interoperability
-with existing workflows. scikit-matter contains a toolbox of methods for
-unsupervised and supervised analysis of ML datasets, including the comparison,
-decomposition, and selection of features and samples.
+computational chemical and materials science community, following the scikit-learn API
+and coding guidelines to promote usability and interoperability with existing workflows.
+scikit-matter contains a toolbox of methods for unsupervised and supervised analysis of
+ML datasets, including the comparison, decomposition, and selection of features and
+samples.
 
 .. toctree::
   :maxdepth: 1
diff --git a/docs/src/intro.rst b/docs/src/intro.rst
index 642fd443f..85a3e7630 100644
--- a/docs/src/intro.rst
+++ b/docs/src/intro.rst
@@ -1,44 +1,68 @@
 What's in scikit-matter?
 ========================
 
-``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_
-compatible utilities that implement methods born out of the materials science
-and chemistry communities.
-
-This package serves two purposes: 1) as a development ground for models and patches that may ultimately be suitable for inclusion
-in sklearn, and 2) to coalesce field-specific sklearn-like routines and models in
-a well-documented and standardized repository.
-
-Currently, scikit-matter contains models described in [Imbalzano2018]_, [Helfrecht2020]_, [Goscinski2021]_ and [Cersonsky2021]_, as well
-as some modifications to sklearn functionalities and minimal datasets that are useful within the field
-of computational materials science and chemistry.
+``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_ compatible
+utilities that implement methods born out of the materials science and chemistry
+communities.
 
+This package serves two purposes: 1) as a development ground for models and patches that
+may ultimately be suitable for inclusion in sklearn, and 2) to coalesce field-specific
+sklearn-like routines and models in a well-documented and standardized repository.
 
+Currently, scikit-matter contains models described in [Imbalzano2018]_,
+[Helfrecht2020]_, [Goscinski2021]_ and [Cersonsky2021]_, as well as some modifications
+to sklearn functionalities and minimal datasets that are useful within the field of
+computational materials science and chemistry.
 
 - Fingerprint Selection:
-   Multiple data sub-selection modules, for selecting the most relevant features and samples out of a large set of candidates [Imbalzano2018]_, [Helfrecht2020]_ and [Cersonsky2021]_.
+   Multiple data sub-selection modules, for selecting the most relevant features and
+   samples out of a large set of candidates [Imbalzano2018]_, [Helfrecht2020]_ and
+   [Cersonsky2021]_.
 
-   * :ref:`CUR-api` decomposition: an iterative feature selection method based upon the singular value decoposition.
-   * :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left singular vectors inspired by Principal Covariates Regression.
-   * :ref:`FPS-api`: a common selection technique intended to exploit the diversity of the input space. The selection of the first point is made at random or by a separate metric.
+   * :ref:`CUR-api` decomposition: an iterative feature selection method based upon the
+        singular value decoposition.
+   * :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left
+     singular vectors inspired by Principal Covariates Regression.
+   * :ref:`FPS-api`: a common selection technique intended to exploit the diversity of
+        the input space. The selection of the first point is made at random or by a
+        separate metric.
    * :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
-   * :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi tessellations to accelerate selection.
-   * :ref:`DCH-api`: selects samples by constructing a directional convex hull and determining which samples lie on the bounding surface.
+   * :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi
+        tessellations to accelerate selection.
+   * :ref:`DCH-api`: selects samples by constructing a directional convex hull and
+        determining which samples lie on the bounding surface.
 
 - Reconstruction Measures:
-   A set of easily-interpretable error measures of the relative information capacity of feature space `F` with respect to feature space `F'`.
-   The methods returns a value between 0 and 1, where 0 means that `F` and `F'` are completey distinct in terms of linearly-decodable information, and where 1 means that `F'` is contained in `F`.
-   All methods are implemented as the root mean-square error for the regression of the feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or sometimes called `X` in the doc) for transformations with different constraints (linear, orthogonal, locally-linear).
-   By default a custom 2-fold cross-validation :py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the generalization of the transformation and efficiency of the computation, since we deal with a multi-target regression problem.
-   Methods were applied to compare different forms of featurizations through different hyperparameters and induced metrics and kernels [Goscinski2021]_ .
+   A set of easily-interpretable error measures of the relative information capacity of
+   feature space `F` with respect to feature space `F'`. The methods returns a value
+   between 0 and 1, where 0 means that `F` and `F'` are completey distinct in terms of
+   linearly-decodable information, and where 1 means that `F'` is contained in `F`. All
+   methods are implemented as the root mean-square error for the regression of the
+   feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or sometimes
+   called `X` in the doc) for transformations with different constraints (linear,
+   orthogonal, locally-linear). By default a custom 2-fold cross-validation
+   :py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the
+   generalization of the transformation and efficiency of the computation, since we deal
+   with a multi-target regression problem. Methods were applied to compare different
+   forms of featurizations through different hyperparameters and induced metrics and
+   kernels [Goscinski2021]_ .
 
-   * :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information recovered through a global linear reconstruction.
-   * :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear reconstruction. 
-   * :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through a local linear reconstruction for the k-nearest neighborhood of each sample.
+   * :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information
+     recovered through a global linear reconstruction.
+   * :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear
+     reconstruction.
+   * :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through
+     a local linear reconstruction for the k-nearest neighborhood of each sample.
 
 - Principal Covariates Regression
 
-   * PCovR: the standard Principal Covariates Regression [deJong1992]_. Utilises a combination between a PCA-like and an LR-like loss, and therefore attempts to find a low-dimensional projection of the feature vectors that simultaneously minimises information loss and error in predicting the target properties using only the latent space vectors $\mathbf{T}$ :ref:`PCovR-api`.
-   * Kernel Principal Covariates Regression (KPCovR) a kernel-based variation on the original PCovR method, proposed in [Helfrecht2020]_ :ref:`KPCovR-api`.
-  
-If you would like to contribute to scikit-matter, check out our :ref:`contributing` page!
+   * PCovR: the standard Principal Covariates Regression [deJong1992]_. Utilises a
+     combination between a PCA-like and an LR-like loss, and therefore attempts to find
+     a low-dimensional projection of the feature vectors that simultaneously minimises
+     information loss and error in predicting the target properties using only the
+     latent space vectors $\mathbf{T}$ :ref:`PCovR-api`.
+   * Kernel Principal Covariates Regression (KPCovR) a kernel-based variation on the
+     original PCovR method, proposed in [Helfrecht2020]_ :ref:`KPCovR-api`.
+
+If you would like to contribute to scikit-matter, check out our :ref:`contributing`
+page!
diff --git a/docs/src/linear_models.rst b/docs/src/linear_models.rst
index ed9ef3b99..4833c844d 100644
--- a/docs/src/linear_models.rst
+++ b/docs/src/linear_models.rst
@@ -6,7 +6,7 @@ Linear Models
 Orthogonal Regression
 #####################
 
-.. autoclass:: OrthogonalRegression 
+.. autoclass:: OrthogonalRegression
 
 .. currentmodule:: skmatter.linear_model._ridge
 
diff --git a/docs/src/preprocessing.rst b/docs/src/preprocessing.rst
index addad1659..4baaeabd7 100644
--- a/docs/src/preprocessing.rst
+++ b/docs/src/preprocessing.rst
@@ -1,5 +1,5 @@
 Preprocessing
-=============================
+=============
 
 .. automodule:: skmatter.preprocessing
    :members:
diff --git a/docs/src/reference.rst b/docs/src/reference.rst
index 5c34960d5..ed2f3d070 100644
--- a/docs/src/reference.rst
+++ b/docs/src/reference.rst
@@ -3,7 +3,6 @@
 API Reference
 =============
 
-
 .. toctree::
   :maxdepth: 1
   :caption: Contents:
diff --git a/docs/src/selection.rst b/docs/src/selection.rst
index dcab88bd7..fa45d8dab 100644
--- a/docs/src/selection.rst
+++ b/docs/src/selection.rst
@@ -40,7 +40,8 @@ This can be executed using:
 
     Xr = selector.transform(X)
 
-where `Selector` is one of the classes below that overwrites the method :py:func:`score`.
+where `Selector` is one of the classes below that overwrites the method
+:py:func:`score`.
 
 From :py:class:`GreedySelector`, selectors inherit these public methods:
 
@@ -58,29 +59,30 @@ CUR
 ###
 
 
-CUR decomposition begins by approximating a matrix :math:`{\mathbf{X}}` using a subset of columns and rows
+CUR decomposition begins by approximating a matrix :math:`{\mathbf{X}}` using a subset
+of columns and rows
 
 .. math::
-    \mathbf{\hat{X}} \approx \mathbf{X}_\mathbf{c} \left(\mathbf{X}_\mathbf{c}^- \mathbf{X} \mathbf{X}_\mathbf{r}^-\right) \mathbf{X}_\mathbf{r}.
+    \mathbf{\hat{X}} \approx \mathbf{X}_\mathbf{c} \left(\mathbf{X}_\mathbf{c}^-
+    \mathbf{X} \mathbf{X}_\mathbf{r}^-\right) \mathbf{X}_\mathbf{r}.
 
 These subsets of rows and columns, denoted :math:`\mathbf{X}_\mathbf{r}` and
-:math:`\mathbf{X}_\mathbf{c}`, respectively, can be determined by iterative
-maximization of a leverage score :math:`\pi`, representative of the relative
-importance of each column or row. From hereon, we will call selection methods
-which are derived off of the CUR decomposition "CUR" as a shorthand for
-"CUR-derived selection". In each iteration of CUR, we select the column or row
-that maximizes :math:`\pi` and orthogonalize the remaining columns or rows.
-These steps are iterated until a sufficient number of features has been selected.
-This iterative approach, albeit comparatively time consuming, is the most
-deterministic and efficient route in reducing the number of features needed to
-approximate :math:`\mathbf{X}` when compared to selecting all features in a
-single iteration based upon the relative :math:`\pi` importance.
-
-The feature and sample selection versions of CUR differ only in the computation
-of :math:`\pi`. In sample selection :math:`\pi` is computed using the left
-singular vectors, versus in feature selection, :math:`\pi` is computed using the
-right singular vectors. In addition to :py:class:`GreedySelector`, both instances
-of CUR selection build off of :py:class:`skmatter._selection._cur._CUR`, and inherit
+:math:`\mathbf{X}_\mathbf{c}`, respectively, can be determined by iterative maximization
+of a leverage score :math:`\pi`, representative of the relative importance of each
+column or row. From hereon, we will call selection methods which are derived off of the
+CUR decomposition "CUR" as a shorthand for "CUR-derived selection". In each iteration of
+CUR, we select the column or row that maximizes :math:`\pi` and orthogonalize the
+remaining columns or rows. These steps are iterated until a sufficient number of
+features has been selected. This iterative approach, albeit comparatively time
+consuming, is the most deterministic and efficient route in reducing the number of
+features needed to approximate :math:`\mathbf{X}` when compared to selecting all
+features in a single iteration based upon the relative :math:`\pi` importance.
+
+The feature and sample selection versions of CUR differ only in the computation of
+:math:`\pi`. In sample selection :math:`\pi` is computed using the left singular
+vectors, versus in feature selection, :math:`\pi` is computed using the right singular
+vectors. In addition to :py:class:`GreedySelector`, both instances of CUR selection
+build off of :py:class:`skmatter._selection._cur._CUR`, and inherit
 
 .. currentmodule:: skmatter._selection
 
@@ -88,7 +90,8 @@ of CUR selection build off of :py:class:`skmatter._selection._cur._CUR`, and inh
 .. automethod:: _CUR._compute_pi
 
 They are instantiated using
-:py:class:`skmatter.feature_selection.CUR` and :py:class:`skmatter.sample_selection.CUR`, e.g.
+:py:class:`skmatter.feature_selection.CUR` and
+:py:class:`skmatter.sample_selection.CUR`, e.g.
 
 .. code-block:: python
 
@@ -117,14 +120,15 @@ They are instantiated using
 PCov-CUR
 ########
 
-PCov-CUR extends upon CUR by using augmented right or left singular vectors
-inspired by Principal Covariates Regression, as demonstrated in [Cersonsky2021]_.
-These methods employ the modified kernel and covariance matrices introduced in :ref:`PCovR-api`
-and available via the Utility Classes.
+PCov-CUR extends upon CUR by using augmented right or left singular vectors inspired by
+Principal Covariates Regression, as demonstrated in [Cersonsky2021]_. These methods
+employ the modified kernel and covariance matrices introduced in :ref:`PCovR-api` and
+available via the Utility Classes.
 
-Again, the feature and sample selection versions of PCov-CUR differ only in the computation
-of :math:`\pi`. So, in addition to :py:class:`GreedySelector`, both instances
-of PCov-CUR selection build off of :py:class:`skmatter._selection._cur._PCovCUR`, inheriting
+Again, the feature and sample selection versions of PCov-CUR differ only in the
+computation of :math:`\pi`. So, in addition to :py:class:`GreedySelector`, both
+instances of PCov-CUR selection build off of
+:py:class:`skmatter._selection._cur._PCovCUR`, inheriting
 
 .. currentmodule:: skmatter._selection
 
@@ -168,15 +172,15 @@ Farthest Point-Sampling (FPS)
 Farthest Point Sampling is a common selection technique intended to exploit the
 diversity of the input space.
 
-In FPS, the selection of the first point is made at random or by a separate metric.
-Each subsequent selection is made to maximize the Haussdorf distance,
-i.e. the minimum distance between a point and all previous selections.
-It is common to use the Euclidean distance, however other distance metrics may be employed.
+In FPS, the selection of the first point is made at random or by a separate metric. Each
+subsequent selection is made to maximize the Haussdorf distance, i.e. the minimum
+distance between a point and all previous selections. It is common to use the Euclidean
+distance, however other distance metrics may be employed.
 
 Similar to CUR, the feature and selection versions of FPS differ only in the way
-distance is computed (feature selection does so column-wise, sample selection does
-so row-wise), and are built off of the same base class, :py:class:`skmatter._selection._fps._FPS`,
-in addition to GreedySelector, and inherit
+distance is computed (feature selection does so column-wise, sample selection does so
+row-wise), and are built off of the same base class,
+:py:class:`skmatter._selection._fps._FPS`, in addition to GreedySelector, and inherit
 
 .. currentmodule:: skmatter._selection
 
@@ -184,8 +188,8 @@ in addition to GreedySelector, and inherit
 .. automethod:: _FPS.get_distance
 .. automethod:: _FPS.get_select_distance
 
-These selectors can be instantiated using
-:py:class:`skmatter.feature_selection.FPS` and :py:class:`skmatter.sample_selection.FPS`.
+These selectors can be instantiated using :py:class:`skmatter.feature_selection.FPS` and
+:py:class:`skmatter.sample_selection.FPS`.
 
 .. code-block:: python
 
@@ -209,13 +213,14 @@ These selectors can be instantiated using
 
 PCov-FPS
 ########
-PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the
-Euclidean distance solely in the space of :math:`\mathbf{X}`, we use a combined
-distance in terms of :math:`\mathbf{X}` and :math:`\mathbf{y}`.
 
-Again, the feature and sample selection versions of PCov-FPS differ only in
-computing the distances. So, in addition to :py:class:`GreedySelector`, both instances
-of PCov-FPS selection build off of :py:class:`skmatter._selection._fps._PCovFPS`, and inherit
+PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the Euclidean
+distance solely in the space of :math:`\mathbf{X}`, we use a combined distance in terms
+of :math:`\mathbf{X}` and :math:`\mathbf{y}`.
+
+Again, the feature and sample selection versions of PCov-FPS differ only in computing
+the distances. So, in addition to :py:class:`GreedySelector`, both instances of PCov-FPS
+selection build off of :py:class:`skmatter._selection._fps._PCovFPS`, and inherit
 
 .. currentmodule:: skmatter._selection
 
@@ -259,7 +264,8 @@ Voronoi FPS
 
 .. autoclass :: VoronoiFPS
 
-These selectors can be instantiated using :py:class:`skmatter.sample_selection.VoronoiFPS`.
+These selectors can be instantiated using
+:py:class:`skmatter.sample_selection.VoronoiFPS`.
 
 .. code-block:: python
 
@@ -285,13 +291,12 @@ These selectors can be instantiated using :py:class:`skmatter.sample_selection.V
 When *Not* to Use Voronoi FPS
 -----------------------------
 
-In many cases, this algorithm may not increase upon the efficiency. For example,
-for simple metrics (such as Euclidean distance), Voronoi FPS will likely not
-accelerate, and may decelerate, computations when compared to FPS.  The sweet
-spot for Voronoi FPS is when the number of selectable samples is already enough
-to divide the space with Voronoi polyhedrons, but not yet comparable to the total
-number of samples, when the cost of bookkeeping significantly degrades the speed
-of work compared to FPS.
+In many cases, this algorithm may not increase upon the efficiency. For example, for
+simple metrics (such as Euclidean distance), Voronoi FPS will likely not accelerate, and
+may decelerate, computations when compared to FPS.  The sweet spot for Voronoi FPS is
+when the number of selectable samples is already enough to divide the space with Voronoi
+polyhedrons, but not yet comparable to the total number of samples, when the cost of
+bookkeeping significantly degrades the speed of work compared to FPS.
 
 .. _DCH-api:
 
@@ -301,7 +306,8 @@ Directional Convex Hull (DCH)
 
 .. autoclass :: DirectionalConvexHull
 
-This selector can be instantiated using `skmatter.sample_selection.DirectionalConvexHull`.
+This selector can be instantiated using
+:class:`skmatter.sample_selection.DirectionalConvexHull`.
 
 .. code-block:: python
 
diff --git a/docs/src/tutorials.rst b/docs/src/tutorials.rst
index ba52579ab..2b592007c 100644
--- a/docs/src/tutorials.rst
+++ b/docs/src/tutorials.rst
@@ -2,7 +2,8 @@ Examples
 ########
 
 For a thorough tutorial of the methods introduced in `scikit-matter`, we suggest you
-check out the pedagogic notebooks in our companion project `kernel-tutorials <https://github.com/lab-cosmo/kernel-tutorials/>`_.
+check out the pedagogic notebooks in our companion project `kernel-tutorials
+<https://github.com/lab-cosmo/kernel-tutorials/>`_.
 
 .. toctree::
   :glob:
diff --git a/docs/src/utils.rst b/docs/src/utils.rst
index b53fbb136..bae996748 100644
--- a/docs/src/utils.rst
+++ b/docs/src/utils.rst
@@ -21,7 +21,9 @@ Orthogonalizers for CUR
 
 .. currentmodule:: skmatter.utils._orthogonalizers
 
-When computing non-iterative CUR, it is necessary to orthogonalize the input matrices after each selection. For this, we have supplied a feature and a sample orthogonalizer for feature and sample selection.
+When computing non-iterative CUR, it is necessary to orthogonalize the input matrices
+after each selection. For this, we have supplied a feature and a sample orthogonalizer
+for feature and sample selection.
 
 .. autofunction:: X_orthogonalizer
 .. autofunction:: Y_feature_orthogonalizer
diff --git a/pyproject.toml b/pyproject.toml
index fc6ca12aa..7870f7300 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -17,7 +17,7 @@ authors = [
     {name = "Victor P. Principe"},
     {name = "Michele Ceriotti"}
 ]
-readme = "README.md"
+readme = "README.rst"
 requires-python = ">=3.8"
 license = {text = "BSD-3-Clause"}
 classifiers = [
diff --git a/src/skmatter/datasets/descr/csd-1000r.rst b/src/skmatter/datasets/descr/csd-1000r.rst
index d97dc21f3..8fa9b55ff 100644
--- a/src/skmatter/datasets/descr/csd-1000r.rst
+++ b/src/skmatter/datasets/descr/csd-1000r.rst
@@ -3,9 +3,9 @@
 CSD-1000R
 #########
 
-This dataset, intended for model testing, contains the SOAP power spectrum
-features and local NMR chemical shieldings for 100 environments selected
-from CSD-1000r, originally published in [Ceriotti2019]_.
+This dataset, intended for model testing, contains the SOAP power spectrum features and
+local NMR chemical shieldings for 100 environments selected from CSD-1000r, originally
+published in [Ceriotti2019]_.
 
 Function Call
 -------------
@@ -15,33 +15,33 @@ Function Call
 Data Set Characteristics
 ------------------------
 
-    :Number of Instances: Each representation 100
+:Number of Instances: Each representation 100
 
-    :Number of Features: Each representation 100
+:Number of Features: Each representation 100
 
-    The representations were computed with [C1]_ using the hyperparameters:
+The representations were computed with [C1]_ using the hyperparameters:
 
-    :rascal hyperparameters:
+:rascal hyperparameters:
 
-    +---------------------------+------------+
-    | key                       |   value    |
-    +---------------------------+------------+
-    | interaction_cutoff:       |    3.5     |
-    +---------------------------+------------+
-    | max_radial:               |      6     |
-    +---------------------------+------------+
-    | max_angular:              |      6     |
-    +---------------------------+------------+
-    | gaussian_sigma_constant": |     0.4    |
-    +---------------------------+------------+
-    | gaussian_sigma_type:      |  "Constant"|
-    +---------------------------+------------+
-    | cutoff_smooth_width:      |     0.5    |
-    +---------------------------+------------+
-    | normalize:                |    True    |
-    +---------------------------+------------+
++---------------------------+------------+
+| key                       |   value    |
++---------------------------+------------+
+| interaction_cutoff:       |    3.5     |
++---------------------------+------------+
+| max_radial:               |      6     |
++---------------------------+------------+
+| max_angular:              |      6     |
++---------------------------+------------+
+| gaussian_sigma_constant": |     0.4    |
++---------------------------+------------+
+| gaussian_sigma_type:      |  "Constant"|
++---------------------------+------------+
+| cutoff_smooth_width:      |     0.5    |
++---------------------------+------------+
+| normalize:                |    True    |
++---------------------------+------------+
 
-    Of the 2'520 resulting features, 100 were selected via FPS using [C2]_.
+Of the 2'520 resulting features, 100 were selected via FPS using [C2]_.
 
 References
 ----------
@@ -57,7 +57,7 @@ Reference Code
         from skmatter.feature_selection import CUR
         from skmatter.preprocessing import StandardFlexibleScaler
         from skmatter.sample_selection import FPS
-        
+
         # read all of the frames and book-keep the centers and species
         filename = "/path/to/CSD-1000R.xyz"
         frames = np.asarray(
diff --git a/src/skmatter/datasets/descr/degenerate_CH4_manifold.rst b/src/skmatter/datasets/descr/degenerate_CH4_manifold.rst
index 306974a9e..07d5b59af 100644
--- a/src/skmatter/datasets/descr/degenerate_CH4_manifold.rst
+++ b/src/skmatter/datasets/descr/degenerate_CH4_manifold.rst
@@ -3,10 +3,14 @@
 Degenerate CH4 manifold
 #######################
 
-The dataset contains two representations (SOAP power spectrum and bispectrum) of the two manifolds spanned by the carbon atoms of two times 81 methane structures.
-The SOAP power spectrum representation the two manifolds intersect creating a degenerate manifold/line for which the representation remains the same.
-In contrast for higher body order representations as the (SOAP) bispectrum the carbon atoms can be uniquely represented and do not create a degenerate manifold.
-Following the naming convention of [Pozdnyakov2020]_ for each representation the first 81 samples correspond to the X minus manifold and the second 81 samples contain the X plus manifold
+The dataset contains two representations (SOAP power spectrum and bispectrum) of the two
+manifolds spanned by the carbon atoms of two times 81 methane structures. The SOAP power
+spectrum representation the two manifolds intersect creating a degenerate manifold/line
+for which the representation remains the same. In contrast for higher body order
+representations as the (SOAP) bispectrum the carbon atoms can be uniquely represented
+and do not create a degenerate manifold. Following the naming convention of
+[Pozdnyakov2020]_ for each representation the first 81 samples correspond to the X minus
+manifold and the second 81 samples contain the X plus manifold
 
 Function Call
 -------------
@@ -16,40 +20,39 @@ Function Call
 Data Set Characteristics
 ------------------------
 
-    :Number of Instances: Each representation 162
-
-    :Number of Features: Each  representation 12
-
-    The representations were computed with [D1]_ using the hyperparameters:
-
-    :rascal hyperparameters:
-
-    +---------------------------+------------+
-    | key                       |   value    |
-    +===========================+============+
-    | radial_basis:             |    "GTO"   |
-    +---------------------------+------------+
-    | interaction_cutoff:       |      4     |
-    +---------------------------+------------+
-    | max_radial:               |      2     |
-    +---------------------------+------------+
-    | max_angular:              |      2     |
-    +---------------------------+------------+
-    | gaussian_sigma_constant": |     0.5    |
-    +---------------------------+------------+
-    | gaussian_sigma_type:      |  "Constant"|
-    +---------------------------+------------+
-    | cutoff_smooth_width:      |     0.5    |
-    +---------------------------+------------+
-    | normalize:                |    False   |
-    +---------------------------+------------+
-
-The SOAP bispectrum features were in addition reduced to 12 features with principal component analysis (PCA) [D2]_.
+:Number of Instances: Each representation 162
+
+:Number of Features: Each  representation 12
+
+The representations were computed with [D1]_ using the hyperparameters:
+
+:rascal hyperparameters:
+
++---------------------------+------------+
+| key                       |   value    |
++===========================+============+
+| radial_basis:             |    "GTO"   |
++---------------------------+------------+
+| interaction_cutoff:       |      4     |
++---------------------------+------------+
+| max_radial:               |      2     |
++---------------------------+------------+
+| max_angular:              |      2     |
++---------------------------+------------+
+| gaussian_sigma_constant": |     0.5    |
++---------------------------+------------+
+| gaussian_sigma_type:      |  "Constant"|
++---------------------------+------------+
+| cutoff_smooth_width:      |     0.5    |
++---------------------------+------------+
+| normalize:                |    False   |
++---------------------------+------------+
+
+The SOAP bispectrum features were in addition reduced to 12 features with principal
+component analysis (PCA) [D2]_.
 
 References
 ----------
 
 .. [D1] https://github.com/lab-cosmo/librascal commit 8d9ad7a
 .. [D2] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
-
-=======
diff --git a/src/skmatter/datasets/descr/nice_dataset.rst b/src/skmatter/datasets/descr/nice_dataset.rst
index 20b3c35e8..23d733755 100644
--- a/src/skmatter/datasets/descr/nice_dataset.rst
+++ b/src/skmatter/datasets/descr/nice_dataset.rst
@@ -3,19 +3,24 @@
 NICE dataset
 ############
 
-This is a toy dataset containing NICE[1, 4](N-body Iterative Contraction of Equivariants) features for first 500 configurations of the dataset[2, 3] with randomly displaced methane configurations. 
+This is a toy dataset containing NICE[1, 4](N-body Iterative Contraction of
+Equivariants) features for first 500 configurations of the dataset[2, 3] with randomly
+displaced methane configurations.
 
 Function Call
 -------------
+
 .. function:: skmatter.datasets.load_nice_dataset
 
 Data Set Characteristics
 ------------------------
 
 :Number of Instances: 500
+
 :Number of Features: 160
 
-The representations were computed using the NICE package[4] using the following definition of the NICE calculator:
+The representations were computed using the NICE package[4] using the following
+definition of the NICE calculator:
 
 .. code-block:: python
 
@@ -52,13 +57,18 @@ The representations were computed using the NICE package[4] using the following
 
 References
 ----------
-[1] Jigyasa Nigam, Sergey Pozdnyakov, and Michele Ceriotti. "Recursive evaluation and iterative contraction of N-body equivariant features." The Journal of Chemical Physics 153.12 (2020): 121101.
+
+[1] Jigyasa Nigam, Sergey Pozdnyakov, and Michele Ceriotti. "Recursive evaluation and
+    iterative contraction of N-body equivariant features." The Journal of Chemical
+    Physics 153.12 (2020): 121101.
 
 [2] Incompleteness of Atomic Structure Representations
-Sergey N. Pozdnyakov, Michael J. Willatt, Albert P. Bartók, Christoph Ortner, Gábor Csányi, and Michele Ceriotti
+    Sergey N. Pozdnyakov, Michael J. Willatt, Albert P. Bartók, Christoph Ortner,
+    Gábor Csányi, and Michele Ceriotti
 
 [3] https://archive.materialscloud.org/record/2020.110
 
 Reference Code
 --------------
+
 [4] https://github.com/lab-cosmo/nice
diff --git a/src/skmatter/datasets/descr/who_dataset.rst b/src/skmatter/datasets/descr/who_dataset.rst
index 4aaf6dd05..b794a70b6 100644
--- a/src/skmatter/datasets/descr/who_dataset.rst
+++ b/src/skmatter/datasets/descr/who_dataset.rst
@@ -42,12 +42,13 @@ References
    .. [8] https://data.worldbank.org/indicator/SN.ITK.DEFC.ZS
    .. [9] https://data.worldbank.org/indicator/SP.DYN.LE00.IN
    .. [10] https://data.worldbank.org/indicator/SP.POP.TOTL
-   
+
 
 Reference Code
 --------------
 
-and compiled through the following script, where the datasets have been placed in a folder named `who_data`:
+and compiled through the following script, where the datasets have been placed in a
+folder named ``who_data``:
 
 .. code-block:: python
 
@@ -68,7 +69,7 @@ and compiled through the following script, where the datasets have been placed i
             sheet_name="Data",
             index_col=0,
         )
-    
+
         indicator = data["Indicator Code"].values[0]
         indicator_codes[indicator] = data["Indicator Name"].values[0]
 
diff --git a/tox.ini b/tox.ini
index 09cfcbdec..52d8a4d8d 100644
--- a/tox.ini
+++ b/tox.ini
@@ -4,7 +4,7 @@ envlist =
     tests
     examples
 
-lint_folders = {toxinidir}/src {toxinidir}/tests
+lint_folders = "{toxinidir}/src" "{toxinidir}/tests" "{toxinidir}/docs/src/"
 
 
 [testenv:tests]
@@ -42,10 +42,12 @@ deps =
     flake8-bugbear
     flake8-sphinx-links
     isort
+    sphinx-lint
 commands =
     flake8 {[tox]lint_folders}
     black --check --diff {[tox]lint_folders}
     isort --check-only --diff {[tox]lint_folders}
+    sphinx-lint --enable line-too-long --max-line-length 88 {[tox]lint_folders} "{toxinidir}/README.rst"
 
 [testenv:format]
 # Abuse tox to do actual formatting. Users can call `tox -e format` to run