Currently FSDP only supports fixed learning, does not support like warm up, cosine decay like megatron