You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when sampling with a conditions dict WITHOUT giving num_rows, the sample call crashes.
In [3]: gc.sample(conditions={'gender': 'M'})
---------------------------------------------------------------------------ValueErrorTraceback (mostrecentcalllast)
/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/sdv/tabular/base.pyin_make_conditions_df(self, conditions, num_rows)
345try:
-->346conditions=pd.DataFrame(conditions)
347exceptValueError:
/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/pandas/core/frame.pyin__init__(self, data, index, columns, dtype, copy)
467elifisinstance(data, dict):
-->468mgr=init_dict(data, index, columns, dtype=dtype)
469elifisinstance(data, ma.MaskedArray):
/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/pandas/core/internals/construction.pyininit_dict(data, index, columns, dtype)
282 ]
-->283returnarrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
284/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/pandas/core/internals/construction.pyinarrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
77ifindexisNone:
--->78index=extract_index(arrays)
79else:
/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/pandas/core/internals/construction.pyinextract_index(data)
386ifnotindexesandnotraw_lengths:
-->387raiseValueError("If using all scalar values, you must pass an index")
388ValueError: Ifusingallscalarvalues, youmustpassanindexDuringhandlingoftheaboveexception, anotherexceptionoccurred:
TypeErrorTraceback (mostrecentcalllast)
<ipython-input-4-d253bdf9697f>in<module>---->1gc.sample(conditions={'gender': 'M'})
/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/sdv/tabular/base.pyinsample(self, num_rows, max_retries, max_rows_multiplier, conditions, float_rtol, graceful_reject_sampling)
443444# convert conditions to dataframe-->445conditions=self._make_conditions_df(conditions, num_rows)
446447# validate columns/home/ubuntu/.virtualenvs/sdv-test/lib/python3.7/site-packages/sdv/tabular/base.pyin_make_conditions_df(self, conditions, num_rows)
346conditions=pd.DataFrame(conditions)
347exceptValueError:
-->348conditions=pd.DataFrame([conditions] *num_rows)
349350elifnotisinstance(conditions, pd.DataFrame):
TypeError: can't multiply sequence by non-int of type 'NoneType'
Expected Behavior
Since sampling without any arguments produces the same number of rows as seen in the input data, one would expect that passing a conditions dict without num_rows would achieve the same result.
The text was updated successfully, but these errors were encountered:
Environment Details
Error Description
When calling the
sample
method of a tabular model without passing any arguments, the model producesas many rows as it saw during the
fit
phase:When sampling conditionally passing a
dict
AND a number of rows to sample, everything works as expected:However, when sampling with a
conditions
dict WITHOUT givingnum_rows
, thesample
call crashes.Expected Behavior
Since sampling without any arguments produces the same number of rows as seen in the input data, one would expect that passing a
conditions
dict withoutnum_rows
would achieve the same result.The text was updated successfully, but these errors were encountered: