Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can not reproduce "imagenet 18" with fp32 #21

Open
Yancey1989 opened this issue Nov 9, 2018 · 1 comment
Open

I can not reproduce "imagenet 18" with fp32 #21

Yancey1989 opened this issue Nov 9, 2018 · 1 comment

Comments

@Yancey1989
Copy link

Yancey1989 commented Nov 9, 2018

Hi @yaroslavvb

I tried to reproduce "imagenet 18" on my host, it works well with fp16(top1 acc get 75.776% at the 27th epoch), but only get 51.018% top1-acc at the 27th epoch when fp32.

image

The entrypoint is as followed, I turned down the batch_size to avoid OOM with fp32, and the same argument with fp16 in my experiment:

PYTHONPATH=/imagenet18 \
NCCL_DEBUG=VERSION \
stdbuf -oL nohup python -m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
--master_addr=127.0.0.1 \
--master_port=6010 \
training/train_imagenet_nv.py /data/imagenet \
--logdir /imagenet18/log_small_bs \
--distributed \
--init-bn0 \
--no-bn-wd \
--phases '[{"ep": 0, "sz": 128, "bs": 224, "trndir": "/sz/160"}, {"ep": (0, 7), "lr": (1.0, 2.0)}, {"ep": (7, 13), "lr": (2.0, 0.25)}, {"ep": 13, "sz": 224, "bs": 96, "trndir": "/sz/352", "min_scale": 0.087}, {"ep": (13, 22), "lr": (0.42857142857142855, 0.04285714285714286)}, {"ep": (22, 25), "lr": (0.04285714285714286, 0.004285714285714286)}, {"ep": 25, "sz": 288, "bs": 50, "min_scale": 0.5, "rect_val": True}, {"ep": (25, 28), "lr": (0.0022321428571428575, 0.00022321428571428573)}]' 2>&1 > local_train.log &

Have you reproduced the conclusion with fp32?

@yaroslavvb
Copy link
Collaborator

It should work with fp32, note that learning rates are tuned for specific batch sizes, if you modify batch-size you should modify the learning rate as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants