Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default argparser #916

Closed
williamFalcon opened this issue Feb 23, 2020 · 8 comments
Closed

default argparser #916

williamFalcon opened this issue Feb 23, 2020 · 8 comments
Assignees
Labels
feature Is an improvement or enhancement good first issue Good for newcomers

Comments

@williamFalcon
Copy link
Contributor

williamFalcon commented Feb 23, 2020

🚀 Feature

Create a default argparser with all the properties that can go into a trainer

Motivation

People already do this themselves pretty often. Might as well make it easy for them

@williamFalcon williamFalcon added feature Is an improvement or enhancement help wanted Open to be worked on labels Feb 23, 2020
@Borda Borda added the good first issue Good for newcomers label Feb 23, 2020
@mtnwni
Copy link
Contributor

mtnwni commented Feb 24, 2020

    def add_default_args(parent_parser):

        parser = ArgumentParser(parents=[parent_parser])

        # training, test, val check intervals
        parser.add_argument('--max_epochs', default=1000, type=int, help='maximum number of epochs')
        parser.add_argument('--min_epochs', default=1, type=int, help='minimum number of epochs')
        parser.add_argument('--max_steps', default=None, type=Optional[int],
                            help='stop training after this number of steps')
        parser.add_argument('--min_steps', default=None, type=Optional[int],
                            help='force training for atleast these number of steps')
        parser.add_argument('--check_val_every_n_epoch', default=1, type=int, help='check val every n epochs')
        parser.add_argument('--accumulate_grad_batches', default=1, type=Union[int, Dict[int, int]],
                            help='accumulates gradients k times before applying update.'
                            ' Simulates huge batch size')
        parser.add_argument('--train_percent_check', default=1.0, type=float,
                            help='how much of training set to check')
        parser.add_argument('--val_percent_check', default=1.0, type=float,
                            help='how much of val set to check')
        parser.add_argument('--test_percent_check', default=1.0, type=float,
                            help='how much of test set to check')

        parser.add_argument('--val_check_interval', default=1.0, type=Union[float],
                            help='how much within 1 epoch to check val')
        parser.add_argument('--log_save_interval', default=100, type=int,
                            help='how many batches between log saves')

        # early stopping
        parser.add_argument('--early_stop_callback', dest='early_stop_callback',
                            type=Optional[Union[EarlyStopping, bool]], default=None)

        # gradient handling
        parser.add_argument('--gradient_clip_val', default=0, type=float)
        parser.add_argument('--track_grad_norm', default=-1, type=int,
                            help='if > 0, will track this grad norm')
        parser.add_argument('--print_nan_grads', default=False, type=bool,
                            help='Prints gradients with nan values')

        # model
        parser.add_argument('--resume_from_checkpoint', default=None, type=Optional[str],
                            help='resumes training from a checkpoint')
        parser.add_argument('--checkpoint_callback', default=True, type=Union[ModelCheckpoint, bool],
                            help='callback for checkpointing')
        parser.add_argument('--truncated_bptt_steps', default=None, type=Optional[int],
                            help='Truncated back prop breaks performs backprop every k steps')
        parser.add_argument('--num_sanity_val_steps', default=5, type=int,
                            help='check runs n batches of val before starting the training')
        parser.add_argument('--process_position', default=0, type=int,
                            help='orders the tqdm bar')
        parser.add_argument('--show_progress_bar', default=True, type=bool,
                            help='If true shows tqdm progress bar')
        parser.add_argument('--distributed_backend', default=None, type=Optional[str],
                            help='The distributed backend to use')
        parser.add_argument('--weights_summary', default='full', type=str,
                            help='Prints a summary of the weights when training begins')
        parser.add_argument('--profiler', default=None, type=Optional[BaseProfiler],
                            help='To profile individual steps during training and assist in'
                            'identifying bottlenecks')

        # model path
        parser.add_argument('--default_save_path', default=None, type=Optional[str],
                            help='Default path for logs and weights')
        parser.add_argument('--weights_save_path', default=None, type=Optional[str],
                            help='Prints a summary of the weights when training begins')

        # GPU
        parser.add_argument('--gpus', default=None, type=Optional[Union[List[int], str, int]])
        parser.add_argument('--num_nodes', dest='num_nodes', type=int, default=1)
        parser.add_argument('--num_tpu_cores', default=None, type=Optional[int])
        parser.add_argument('--use_amp', dest='use_amp', default=False, type=bool)
        parser.add_argument('--check_grad_nans', dest='check_grad_nans', action='store_true')

        # Fast Training
        parser.add_argument('--fast_dev_run', dest='fast_dev_run', default=False, type=bool,
                            help='runs validation after 1 training step')
        parser.add_argument('--overfit_pct', default=0.0, type=float, dest='overfit_pct',
                            help='%% of dataset to use with this option. float, or -1 for none')

        # log
        parser.add_argument('--logger', default=True, type=Union[LightningLoggerBase, bool])
        parser.add_argument('--log_gpu_memory', default=None, type=Optional[str])
        parser.add_argument('--row_log_interval', default=10, type=int,
                            help='add log every k batches')

        return parser

Something like this as a static method of Trainer or as a util works?

@Borda
Copy link
Member

Borda commented Feb 24, 2020

@skepticleo looks good, could you send a PR? 🤖

@XDynames
Copy link
Contributor

XDynames commented Feb 24, 2020

This would go really nicely with a classmethod/overload that constructs the Trainer from the passed arguments.

@classmethod
def from_default_args(self, args):  
    trainer = Trainer( <Unpack the args into constructor keywords>)  
    return trainer

Then the user's pattern becomes:

  args = parser.parse_args()  
  trainer = Trainer.from_default_args(args)

@Borda Borda removed the help wanted Open to be worked on label Feb 24, 2020
@XDynames
Copy link
Contributor

XDynames commented Mar 2, 2020

In the arguments some of the default values are set to None to comply with the the Trainer's constructor. In my case I often pass and save my args into the LightningModule as self.hparams. When there are values of None associated with an argument in the Namespace object Tensorboard will raise a ValueError exception as it is not an int, float, str, bool or torch.tensor

By modifying the defaults to be acceptable alternatives (ie. for --gpus, default=0) this issue can be avoided (Assuming a work around exists currently for each Trainer parameter)

@Borda
Copy link
Member

Borda commented Mar 3, 2020

@XDynames could you check #1023 and send a PR with adjustments?

@XDynames
Copy link
Contributor

XDynames commented Mar 4, 2020

@Borda Honestly I wasn't sure which way would be the best to address it, locally I have just added a dictionary to catch the cases that cause the issue, but that is hard to maintain

Ultimately it would be better to adjust the default arguments in the constructor from None to some other value, but I don't have enough scope to fully understand the implications of that for legacy support or constructor code

I also had a pending pull request on @skepticleo 's fork that parsed the constructor doc string to extract one line help messages for the arguments. Should I look to include this as well?

@Borda
Copy link
Member

Borda commented Mar 4, 2020

You can convert params to dictionary (temporary) and filter items to be not None...

@XDynames
Copy link
Contributor

XDynames commented Mar 4, 2020

Sure, but then the user can't use some of the arguments like --gpus
If we actually deal with them case by case we end up having to store a mapping between None arguments and there equivalent to None defaults. Some examples of this would be:
gpus = 0
profiler = False
default_save_path = os.getcwd()

And then in some cases there are explicit checks for a NoneType in the constructor like for distributed_backend so there is no good default mapping for them other than None

So you end up in a position of filtering them before saving as hparams, but after instancing the Trainer (which causes you to lose some information about your settings) or not including them in the default argparser removing a large amount of common use from it, multi gpu settings, save paths, ect

I can't see a good bandaid here, it requires the removal of NoneTypes as defaults from the constructors signature and a repair of what that impacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants