sdv-dev · katxiao · Feb 22, 2022 · Feb 3, 2022 · Feb 11, 2022 · Feb 11, 2022
diff --git a/docs/user_guides/single_table/copulagan.rst b/docs/user_guides/single_table/copulagan.rst
@@ -688,19 +688,23 @@ Conditional Sampling
 
 As the name implies, conditional sampling allows us to sample from a conditional
 distribution using the ``CopulaGAN`` model, which means we can generate only values that
-satisfy certain conditions. These conditional values can be passed to the ``conditions``
-parameter in the ``sample`` method either as a dataframe or a dictionary.
+satisfy certain conditions. These conditional values can be passed to the ``sample_conditions``
+method as a list of ``sdv.sampling.Condition`` objects or to the ``sample_remaining_columns`` method
+as a dataframe.
 
-In case a dictionary is passed, the model will generate as many rows as requested,
-all of which will satisfy the specified conditions, such as ``gender = M``.
+When specifying a ``sdv.sampling.Condition`` object, we can pass in the desired conditions
+as a dictionary, as well as specify the number of desired rows for that condition.
 
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    from sdv.sampling import Condition
+
+    condition = Condition({
         'gender': 'M'
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 It's also possible to condition on multiple columns, such as
@@ -709,14 +713,16 @@ It's also possible to condition on multiple columns, such as
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'gender': 'M',
         'experience_years': 0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
 
+    model.sample_conditions(conditions=[condition])
 
-The ``conditions`` can also be passed as a dataframe. In that case, the model
+
+In the ``sample_remaining_columns`` method, ``conditions`` is
+passed as a dataframe. In that case, the model
 will generate one sample for each row of the dataframe, sorted in the same
 order. Since the model already knows how many samples to generate, passing
 it as a parameter is unnecessary. For example, if we want to generate three
@@ -731,7 +737,7 @@ following:
     conditions = pd.DataFrame({
         'gender': ['M', 'M', 'M', 'F', 'F', 'F'],
     })
-    model.sample(conditions=conditions)
+    model.sample_remaining_columns(conditions)
 
 
 ``CopulaGAN`` also supports conditioning on continuous values, as long as the values
@@ -741,10 +747,11 @@ dataset are within 0 and 1, ``CopulaGAN`` will not be able to set this value to
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'degree_perc': 70.0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 .. note::

diff --git a/docs/user_guides/single_table/ctgan.rst b/docs/user_guides/single_table/ctgan.rst
@@ -499,19 +499,23 @@ Conditional Sampling
 
 As the name implies, conditional sampling allows us to sample from a conditional
 distribution using the ``CTGAN`` model, which means we can generate only values that
-satisfy certain conditions. These conditional values can be passed to the ``conditions``
-parameter in the ``sample`` method either as a dataframe or a dictionary.
+satisfy certain conditions. These conditional values can be passed to the ``sample_conditions``
+method as a list of ``sdv.sampling.Condition`` objects or to the ``sample_remaining_columns``
+method as a dataframe.
 
-In case a dictionary is passed, the model will generate as many rows as requested,
-all of which will satisfy the specified conditions, such as ``gender = M``.
+When specifying a ``sdv.sampling.Condition`` object, we can pass in the desired conditions
+as a dictionary, as well as specify the number of desired rows for that condition.
 
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    from sdv.sampling import Condition
+
+    condition = Condition({
         'gender': 'M'
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 It's also possible to condition on multiple columns, such as
@@ -520,14 +524,16 @@ It's also possible to condition on multiple columns, such as
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'gender': 'M',
         'experience_years': 0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
 
+    model.sample_conditions(conditions=[condition])
 
-The ``conditions`` can also be passed as a dataframe. In that case, the model
+
+In the ``sample_remaining_columns`` method, ``conditions`` is
+passed as a dataframe. In that case, the model
 will generate one sample for each row of the dataframe, sorted in the same
 order. Since the model already knows how many samples to generate, passing
 it as a parameter is unnecessary. For example, if we want to generate three
@@ -542,7 +548,7 @@ following:
     conditions = pd.DataFrame({
         'gender': ['M', 'M', 'M', 'F', 'F', 'F'],
     })
-    model.sample(conditions=conditions)
+    model.sample_remaining_columns(conditions)
 
 
 ``CTGAN`` also supports conditioning on continuous values, as long as the values
@@ -552,10 +558,11 @@ dataset are within 0 and 1, ``CTGAN`` will not be able to set this value to 1000
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'degree_perc': 70.0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 .. note::

diff --git a/docs/user_guides/single_table/gaussian_copula.rst b/docs/user_guides/single_table/gaussian_copula.rst
@@ -648,19 +648,23 @@ Conditional Sampling
 
 As the name implies, conditional sampling allows us to sample from a conditional
 distribution using the ``GaussianCopula`` model, which means we can generate only values that
-satisfy certain conditions. These conditional values can be passed to the ``conditions``
-parameter in the ``sample`` method either as a dataframe or a dictionary.
+satisfy certain conditions. These conditional values can be passed to the ``sample_conditions``
+method as a list of ``sdv.sampling.Condition`` objects or to the ``sample_remaining_columns``
+method as a dataframe.
 
-In case a dictionary is passed, the model will generate as many rows as requested,
-all of which will satisfy the specified conditions, such as ``gender = M``.
+When specifying a ``sdv.sampling.Condition`` object, we can pass in the desired conditions
+as a dictionary, as well as specify the number of desired rows for that condition.
 
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    from sdv.sampling import Condition
+
+    condition = Condition({
         'gender': 'M'
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 It's also possible to condition on multiple columns, such as
@@ -669,14 +673,16 @@ It's also possible to condition on multiple columns, such as
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'gender': 'M',
         'experience_years': 0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
 
+    model.sample_conditions(conditions=[condition])
 
-The ``conditions`` can also be passed as a dataframe. In that case, the model
+
+In the ``sample_remaining_columns`` method, ``conditions`` is
+passed as a dataframe. In that case, the model
 will generate one sample for each row of the dataframe, sorted in the same
 order. Since the model already knows how many samples to generate, passing
 it as a parameter is unnecessary. For example, if we want to generate three
@@ -691,7 +697,7 @@ following:
     conditions = pd.DataFrame({
         'gender': ['M', 'M', 'M', 'F', 'F', 'F'],
     })
-    model.sample(conditions=conditions)
+    model.sample_remaining_columns(conditions)
 
 
 ``GaussianCopula`` also supports conditioning on continuous values, as long as the values
@@ -701,10 +707,11 @@ dataset are within 0 and 1, ``GaussianCopula`` will not be able to set this valu
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'degree_perc': 70.0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 .. note::

diff --git a/docs/user_guides/single_table/tvae.rst b/docs/user_guides/single_table/tvae.rst
@@ -484,19 +484,22 @@ Conditional Sampling
 
 As the name implies, conditional sampling allows us to sample from a conditional
 distribution using the ``TVAE`` model, which means we can generate only values that
-satisfy certain conditions. These conditional values can be passed to the ``conditions``
-parameter in the ``sample`` method either as a dataframe or a dictionary.
+satisfy certain conditions. These conditional values can be passed to the ``sample_conditions``
+method as a list of ``sdv.sampling.Condition`` objects or to the ``sample_remaining_columns``
+method as a dataframe.
 
-In case a dictionary is passed, the model will generate as many rows as requested,
-all of which will satisfy the specified conditions, such as ``gender = M``.
+When specifying a ``sdv.sampling.Condition`` object, we can pass in the desired conditions as a dictionary, as well as specify the number of desired rows for that condition.
 
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    from sdv.sampling import Condition
+
+    condition = Condition({
         'gender': 'M'
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 It's also possible to condition on multiple columns, such as
@@ -505,14 +508,16 @@ It's also possible to condition on multiple columns, such as
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'gender': 'M',
         'experience_years': 0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
 
+    model.sample_conditions(conditions=[condition])
 
-The ``conditions`` can also be passed as a dataframe. In that case, the model
+
+In the ``sample_remaining_columns`` method, ``conditions`` is
+passed as a dataframe. In that case, the model
 will generate one sample for each row of the dataframe, sorted in the same
 order. Since the model already knows how many samples to generate, passing
 it as a parameter is unnecessary. For example, if we want to generate three
@@ -527,7 +532,7 @@ following:
     conditions = pd.DataFrame({
         'gender': ['M', 'M', 'M', 'F', 'F', 'F'],
     })
-    model.sample(conditions=conditions)
+    model.sample_remaining_columns(conditions)
 
 
 ``TVAE`` also supports conditioning on continuous values, as long as the values
@@ -537,10 +542,11 @@ dataset are within 0 and 1, ``TVAE`` will not be able to set this value to 1000.
 .. ipython:: python
     :okwarning:
 
-    conditions = {
+    condition = Condition({
         'degree_perc': 70.0
-    }
-    model.sample(5, conditions=conditions)
+    }, num_rows=5)
+
+    model.sample_conditions(conditions=[condition])
 
 
 .. note::

diff --git a/sdv/tabular/base.py b/sdv/tabular/base.py
@@ -409,7 +409,7 @@ def sample(self, num_rows, randomize_samples=True):
             num_rows (int):
                 Number of rows to sample. This parameter is required.
             randomize_samples (bool):
-                Whether or not to use a a fixed seed when sampling. Defaults
+                Whether or not to use a fixed seed when sampling. Defaults
                 to True.
 
         Returns:
@@ -443,13 +443,11 @@ def _sample_with_conditions(self, conditions, max_tries, batch_size_per_try):
             ValueError:
                 If any of the following happens:
                     * any of the conditions' columns are not valid.
-                    * `graceful_reject_sampling` is `False` and not enough valid rows could be
-                      sampled within `max_tries` trials.
                     * no rows could be generated.
         """
         for column in conditions.columns:
             if column not in self._metadata.get_fields():
-                raise ValueError(f'Error: Unexpected column name `{column}`. '
+                raise ValueError(f'Unexpected column name `{column}`. '
                                  f'Use a column name that was present in the original data.')
 
         try:
@@ -524,7 +522,7 @@ def sample_conditions(self, conditions, max_tries=100, batch_size_per_try=None,
                 The batch size to use per attempt at sampling. Defaults to 10 times
                 the number of rows.
             randomize_samples (bool):
-                Whether or not to use a a fixed seed when sampling. Defaults
+                Whether or not to use a fixed seed when sampling. Defaults
                 to True.
 
         Returns:
@@ -549,6 +547,38 @@ def sample_conditions(self, conditions, max_tries=100, batch_size_per_try=None,
 
         return sampled
 
+    def sample_remaining_columns(self, known_columns, max_tries=100, batch_size_per_try=None,
+                                 randomize_samples=True):
+        """Sample rows from this table.
+
+        Args:
+            known_columns (pandas.DataFrame):
+                A pandas.DataFrame with the columns that are already known. The output
+                is a DataFrame such that each row in the output is sampled
+                conditionally on the corresponding row in the input.
+            max_tries (int):
+                Number of times to try sampling discarded rows. Defaults to 100.
+            batch_size_per_try (int):
+                The batch size to use per attempt at sampling. Defaults to 10 times
+                the number of rows.
+            randomize_samples (bool):
+                Whether or not to use a fixed seed when sampling. Defaults
+                to True.
+
+        Returns:
+            pandas.DataFrame:
+                Sampled data.
+
+        Raises:
+            ConstraintsNotMetError:
+                If the conditions are not valid for the given constraints.
+            ValueError:
+                If any of the following happens:
+                    * any of the conditions' columns are not valid.
+                    * no rows could be generated.
+        """
+        return self._sample_with_conditions(known_columns, max_tries, batch_size_per_try)
+
     def _get_parameters(self):
         raise NonParametricError()