Update README.md

Enhance README
dreamquark-ai · Nov 7, 2020 · 2a9757d · 2a9757d
1 parent 1656c2b
commit 2a9757d
Showing 1 changed file with 40 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -47,7 +47,7 @@ If you wan to use it locally within a docker container:
 
 TabNet is now scikit-compatible, training a TabNetClassifier or TabNetRegressor is really easy.
 
-```
+```python
 from pytorch_tabnet.tab_model import TabNetClassifier, TabNetRegressor
 
 clf = TabNetClassifier()  #TabNetRegressor()
@@ -60,7 +60,7 @@ preds = clf.predict(X_test)
 
 or for TabNetMultiTaskClassifier :
 
-```
+```python
 from pytorch_tabnet.multitask import TabNetMultiTaskClassifier
 clf = TabNetMultiTaskClassifier()
 clf.fit(
@@ -72,7 +72,7 @@ preds = clf.predict(X_test)
 
 ### Custom early_stopping_metrics
 
-```
+```python
 from pytorch_tabnet.metrics import Metric
 from sklearn.metrics import roc_auc_score
 
@@ -107,122 +107,124 @@ A specific customization example notebook is available here : https://github.com
 
 ## Model parameters
 
-- n_d : int (default=8)
+- `n_d` : int (default=8)
 
     Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting.
     Values typically range from 8 to 64.
 
-- n_a : int (default=8)
+- `n_a`: int (default=8)
 
     Width of the attention embedding for each mask.
     According to the paper n_d=n_a is usually a good choice. (default=8)
 
-- n_steps : int (default=3)
+- `n_steps` : int (default=3)
 
     Number of steps in the architecture (usually between 3 and 10)  
 
-- gamma : float  (default=1.3)
+- `gamma` : float  (default=1.3)
 
     This is the coefficient for feature reusage in the masks.
     A value close to 1 will make mask selection least correlated between layers.
     Values range from 1.0 to 2.0.
 
-- cat_idxs : list of int (default =[])
+- `cat_idxs` : list of int (default =[])
 
     List of categorical features indices.
 
-- cat_emb_dim : list of int
+- `cat_emb_dim` : list of int
 
     List of embeddings size for each categorical features. (default =1)
 
-- n_independent : int  (default=2)
+- `n_independent` : int  (default=2)
 
     Number of independent Gated Linear Units layers at each step.
     Usual values range from 1 to 5.
 
-- n_shared : int (default=2)
+- `n_shared` : int (default=2)
 
     Number of shared Gated Linear Units at each step
     Usual values range from 1 to 5
-- epsilon : float  (default 1e-15)
+- `epsilon` : float  (default 1e-15)
 
     Should be left untouched.
 
-- seed : int (default=0)
+- `seed` : int (default=0)
 
     Random seed for reproducibility
 
-- momentum : float
+- `momentum` : float
 
     Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)
 
-- clip_value : float (default None)
+- `clip_value` : float (default None)
 
     If a float is given this will clip the gradient at clip_value.
-- lambda_sparse : float (default = 1e-3)
+
+- `lambda_sparse` : float (default = 1e-3)
 
     This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.
 
-- optimizer_fn : torch.optim (default=torch.optim.Adam)
+- `optimizer_fn` : torch.optim (default=torch.optim.Adam)
 
     Pytorch optimizer function
 
-- optimizer_params: dict (default=dict(lr=2e-2))
+- `optimizer_params`: dict (default=dict(lr=2e-2))
 
     Parameters compatible with optimizer_fn used initialize the optimizer. Since we have Adam as our default optimizer, we use this to define the initial learning rate used for training. As mentionned in the original paper, a large initial learning of ```0.02 ```  with decay is a good option.
 
-- scheduler_fn : torch.optim.lr_scheduler (default=None)
+- `scheduler_fn` : torch.optim.lr_scheduler (default=None)
 
     Pytorch Scheduler to change learning rates during training.
 
-- scheduler_params : dict
+- `scheduler_params` : dict
 
     Dictionnary of parameters to apply to the scheduler_fn. Ex : {"gamma": 0.95, "step_size": 10}
 
-- model_name : str (default = 'DreamQuarkTabNet')
+- `model_name` : str (default = 'DreamQuarkTabNet')
 
     Name of the model used for saving in disk, you can customize this to easily retrieve and reuse your trained models.
 
-- saving_path : str (default = './')
+- `saving_path` : str (default = './')
 
     Path defining where to save models.
 
-- verbose : int (default=1)
+- `verbose` : int (default=1)
 
     Verbosity for notebooks plots, set to 1 to see every epoch, 0 to get None.
 
-- device_name : str (default='auto')
+- `device_name` : str (default='auto')
     'cpu' for cpu training, 'gpu' for gpu training, 'auto' to automatically detect gpu.
 
-- mask_type: str (default='sparsemax')
+- `mask_type: str` (default='sparsemax')
     Either "sparsemax" or "entmax" : this is the masking function to use for selecting features
 
 ## Fit parameters
 
-- X_train : np.array
+- `X_train` : np.array
 
     Training features
 
-- y_train : np.array
+- `y_train` : np.array
 
     Training targets
 
-- eval_set: list of tuple  
+- `eval_set`: list of tuple  
 
     List of eval tuple set (X, y).  
     The last one is used for early stopping  
 
-- eval_name: list of str  
+- `eval_name`: list of str  
               List of eval set names.  
 
-- eval_metric : list of str  
+- `eval_metric` : list of str  
               List of evaluation metrics.  
               The last metric is used for early stopping.
 
-- max_epochs : int (default = 200)
+- `max_epochs` : int (default = 200)
 
     Maximum number of epochs for trainng.
-- patience : int (default = 15)
+
+- `patience` : int (default = 15)
 
     Number of consecutive epochs without improvement before performing early stopping.
 
@@ -234,27 +236,27 @@ A specific customization example notebook is available here : https://github.com
     1 : automated sampling with inverse class occurences
     dict : keys are classes, values are weights for each class
 
-- loss_fn : torch.loss or list of torch.loss
+- `loss_fn` : torch.loss or list of torch.loss
 
     Loss function for training (default to mse for regression and cross entropy for classification)
     When using TabNetMultiTaskClassifier you can set a list of same length as number of tasks,
     each task will be assigned its own loss function
 
-- batch_size : int (default=1024)
+- `batch_size` : int (default=1024)
 
     Number of examples per batch, large batch sizes are recommended.
 
-- virtual_batch_size : int (default=128)
+- `virtual_batch_size` : int (default=128)
 
     Size of the mini batches used for "Ghost Batch Normalization"
 
-- num_workers : int (default=0)
+- `num_workers` : int (default=0)
 
     Number or workers used in torch.utils.data.Dataloader
 
-- drop_last : bool (default=False)
+- `drop_last` : bool (default=False)
 
     Whether to drop last batch if not complete during training
 
-- callbacks : list of callback function  
+- `callbacks` : list of callback function  
         List of custom callbacks