You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After adding the framework in #715, we should add a configuration that allows the user to optimize a tabular model for speed.
Expected behavior
A speed optimized tabular model will:
Use the GaussianCopula
Not round numerical data
Enforce min/max values on numerical data
Use the CategoricalTransformer (to be renamed: FrequencyEncoder) on any categorical columns
Do not model null values. Instead, randomly generate null values based on observed proportions
Default all univariate distributions to gaussian instead of performing a search
API
The TabularPreset will accept 2 parameters:
(required) optimize_for, the name of the preset. We'll add "SPEED"
metadata a dictionary or TableMetadata object, similar to tabular models today
It will print out hard-coded info about the configuration & benchmarks.
>>>fromsdv.liteimportTabularPreset# the PresetModel will not allow too many extra settings, making it simple to use>>>model=TabularPreset(optimize_for='SPEED', metadata=my_metadata)
Info: Thisconfigoptimizesthemodelingspeedaboveallelse.
Yourexactruntimeisdependentonthedata. Benchmarks:
100Krowsand100columnsmaytakearound1minute.
1Mrowsand250columnsmaytakearound30minutes.
# these work like any other tabular model>>>model.fit(my_data)
>>>model.sample(my_data)
Edge Cases
If the user does not pass in metadata, we should still run the model, but we should provide a warning:
Problem Description
After adding the framework in #715, we should add a configuration that allows the user to optimize a tabular model for speed.
Expected behavior
A speed optimized tabular model will:
GaussianCopula
CategoricalTransformer
(to be renamed:FrequencyEncoder
) on any categorical columnsgaussian
instead of performing a searchAPI
The
TabularPreset
will accept 2 parameters:optimize_for
, the name of the preset. We'll add"SPEED"
metadata
a dictionary orTableMetadata
object, similar to tabular models todayIt will print out hard-coded info about the configuration & benchmarks.
Edge Cases
If the user does not pass in metadata, we should still run the model, but we should provide a warning:
If the user does not pass in the required optimization parameter, throw an error.
The text was updated successfully, but these errors were encountered: