diff --git a/docs/source/configuration.rst b/docs/source/configuration.rst
index 78c0bfbb..496922ae 100644
--- a/docs/source/configuration.rst
+++ b/docs/source/configuration.rst
@@ -6,52 +6,6 @@ Some more details on stability report settings, in particular how to set:
 the reference dataset, binning specifications, monitoring rules, and where to plot boundaries.
 
 
-Reference types
----------------
-
-When generating a report from a DataFrame, the reference type can be set with the option ``reference_type``,
-in four different ways:
-
-1. Using the DataFrame on which the stability report is built as a self-reference. This reference method is static: each time slot is compared to all the slots in the DataFrame (all included in one distribution). This is the default reference setting.
-
-    .. code-block:: python
-
-      # generate stability report with specific monitoring rules
-      report = df.pm_stability_report(reference_type="self")
-
-2. Using an external reference DataFrame or set of histograms. This is also a static method: each time slot is compared to all the time slots in the reference data.
-
-    .. code-block:: python
-
-      # generate stability report with specific monitoring rules
-      report = df.pm_stability_report(reference_type="external", reference=reference)
-
-3. Using a rolling window within the same DataFrame as reference. This method is dynamic: we can set the size of the window and the shift from the current time slot. By default the 10 preceding time slots are used as reference (shift=1, window_size=10).
-
-    .. code-block:: python
-
-      settings = Settings()
-      settings.comparison.window = 10
-      settings.comparison.shift = 1
-
-      # generate stability report with specific monitoring rules
-      report = df.pm_stability_report(reference_type="rolling", settings=settings)
-
-4. Using an expanding window on all preceding time slots within the same DataFrame. This is also a dynamic method, with variable window size. All the available previous time slots are used. For example, if we have 2 time slots available and shift=1, window size will be 1 (so the previous slot is the reference), while if we have 10 time slots and shift=1, window size will be 9 (and all previous time slots are reference).
-
-    .. code-block:: python
-
-      settings = Settings()
-      settings.comparison.shift = 1
-
-      # generate stability report with specific monitoring rules
-      report = df.pm_stability_report(reference_type="expanding", settings=settings)
-
-Note that, by default, popmon also performs a rolling comparison of the histograms in each time period with those in the
-previous time period. The results of these comparisons contain the term "prev1", and are found in the comparisons section
-of a report.
-
-
 Binning specifications
 ----------------------
 
@@ -277,6 +231,7 @@ Now that spark is installed, restart the runtime.
       .config("spark.sql.session.timeZone", "GMT")
       .getOrCreate()
   )
+
 Troubleshooting Spark
 ~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
index e05d870e..626a89df 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -14,6 +14,7 @@ Contents
    :maxdepth: 2
 
    introduction
+   reference_types
    profiles
    comparisons
    tutorials
diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst
index 0c5ef249..72cf7c52 100644
--- a/docs/source/introduction.rst
+++ b/docs/source/introduction.rst
@@ -9,7 +9,8 @@ And you probably want to know this, as you might want to retrain your model.
  
 To monitor the stability over time, we have developed popmon (**pop**\ ulation shift **mon**\ itor). Popmon takes as input a DataFrame (either pandas or Spark), of which one of the columns should represent the date, and will then produce a report that indicates how stable all other columns are over time.
  
-For each column, the stability is determined by taking a reference (for example the data on which you have trained your classifier) and contrasting each time slot to this reference. This can be done in various ways:
+For each column, the stability is determined by taking a :doc:`reference <reference_types>` (for example the data on which you have trained your classifier) and contrasting each time slot to this reference.
+This can be done in various ways:
 
 * :doc:`Profiles <profiles>`: for example tracking the mean over time and contrasting this to the reference data. Similar analyses can be done with other summary statistics, such as median, min, max or quartiles.
 * :doc:`Comparisons <comparisons>`: statistically comparing each time slot to the reference data (for example using Kolmogorov-Smirnov, chi-squared, or Pearson correlation).
@@ -52,4 +53,6 @@ Of course, the exact thresholds (four and seven standard deviations) can be conf
    
    Illustration of how traffic light bounds are determined using reference data.
 
-For speed of processing, the data is converted into histograms prior to the comparisons. This greatly simplifies comparisons of large amounts of data with each other, which is especially beneficial for Spark DataFrames. In addition, it enables you to store the histograms together with the report (since the histograms are just a fraction of the size of the original data), making it easy to go back to a previous report and investigate what happened.
+For speed of processing, the data is converted into histograms prior to the comparisons.
+This greatly simplifies comparisons of large amounts of data with each other, which is especially beneficial for Spark DataFrames.
+In addition, it enables you to store the histograms together with the report (since the histograms are just a fraction of the size of the original data), making it easy to go back to a previous report and investigate what happened.
diff --git a/docs/source/popmon.pipeline.rst b/docs/source/popmon.pipeline.rst
index d294e08c..30f34fc9 100644
--- a/docs/source/popmon.pipeline.rst
+++ b/docs/source/popmon.pipeline.rst
@@ -12,6 +12,14 @@ popmon.pipeline.amazing\_pipeline module
    :undoc-members:
    :show-inheritance:
 
+popmon.pipeline.dataset\_splitter module
+----------------------------------------
+
+.. automodule:: popmon.pipeline.dataset_splitter
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
 popmon.pipeline.metrics module
 ------------------------------
 
diff --git a/docs/source/reference_types.rst b/docs/source/reference_types.rst
new file mode 100644
index 00000000..ba832de3
--- /dev/null
+++ b/docs/source/reference_types.rst
@@ -0,0 +1,86 @@
+Reference types
+===============
+
+When generating a report from a DataFrame, the reference type can be set with the option ``reference_type``,
+in different ways:
+
++-----------------+
+| Reference Type  |
++=================+
+| Self            |
++-----------------+
+| External        |
++-----------------+
+| Rolling         |
++-----------------+
+| Expanding       |
++-----------------+
+
+Note that, by default, ``popmon`` also performs a rolling comparison of the histograms in each time period with those in the
+previous time period. The results of these comparisons contain the term "prev1", and are found in the comparisons section
+of a report.
+
+Self reference
+--------------
+
+Using the DataFrame on which the stability report is built as a self-reference. This reference method is static: each time slot is compared to all the slots in the DataFrame (all included in one distribution). This is the default reference setting.
+
+    .. code-block:: python
+
+      # generate stability report with specific monitoring rules
+      report = df.pm_stability_report(reference_type="self")
+
+
+The self-reference compares against the full dataset by default.
+It is also supported to use a subset of the beginning of the data are used as reference point, e.g. the training data for a model.
+The size of this subset is taken based on the ``split`` parameter.
+``split`` accepts (1) a number of samples (integer), (2) a fraction of the dataset (float) or (3) a condition (string).
+
+    .. code-block:: python
+
+      # use the first 1000 rows as reference
+      report = df.pm_stability_report(reference_type="self", split=1000)
+
+
+External reference
+------------------
+
+Using an external reference DataFrame or set of histograms. This is also a static method: each time slot is compared to all the time slots in the reference data.
+
+    .. code-block:: python
+
+      # generate stability report with specific monitoring rules
+      report = df.pm_stability_report(reference_type="external", reference=reference)
+
+
+Rolling reference
+-----------------
+
+Using a rolling window within the same DataFrame as reference. This method is dynamic: we can set the size of the window and the shift from the current time slot. By default the 10 preceding time slots are used as reference (shift=1, window_size=10).
+
+    .. code-block:: python
+
+      # reference_type should be passed to the settings when provided
+      settings = Settings(reference_type="rolling")
+      settings.comparison.window = 10
+      settings.comparison.shift = 1
+
+      # alternatively you could do
+      settings.reference_type = "rolling"
+
+      # generate stability report with specific monitoring rules
+      report = df.pm_stability_report(settings=settings)
+
+
+Expanding reference
+-------------------
+
+Using an expanding window on all preceding time slots within the same DataFrame. This is also a dynamic method, with variable window size. All the available previous time slots are used. For example, if we have 2 time slots available and shift=1, window size will be 1 (so the previous slot is the reference), while if we have 10 time slots and shift=1, window size will be 9 (and all previous time slots are reference).
+
+    .. code-block:: python
+
+      settings = Settings(reference_type="expanding")
+      settings.comparison.shift = 1
+
+      # generate stability report with specific monitoring rules
+      report = df.pm_stability_report(settings=settings)