Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for train on windows #37

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

feat: add support for train on windows #37

wants to merge 5 commits into from

Conversation

Wang-zipeng
Copy link

Implement train on windows.
Compile steps(need visual studio 2017):

  1. Setup compile environment:
    execute "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" in the windows cmd to establish a compile environment.
  2. execute "set DISTUTILS_USE_SDK=1"
  3. Compile
    enter the code folder and use command "python setup.py develop"

Train steps:

  1. enter the "playground\centernet.res18.coco.512size" folder and run "python train_net.py" or run in the IDE like pycharm. If you want to run in the pycharm be attention set python.exe's path in the PATH environment variable.
  2. If you want to train resnet50/101, just copy train_net.py to the "playground\centernet.res50.coco.512size" or "playground\centernet.res101.coco.512size"

Anothers:

  1. Because we can't use "os.statvfs" on the windows so i remove the disk check, and i think it doesn't matter.
  2. Because we can't use "os.getuid()" on the windows so i set a constant name as "User_name", and i think is impossible to train on windows cluster.

@@ -66,7 +66,7 @@ def default_argument_parser():
# PyTorch still may leave orphan processes in multi-gpu training.
# Therefore we use a deterministic way to obtain port,
# so that users are aware of orphan processes by seeing the port occupied.
port = 2 ** 15 + 2 ** 14 + hash(os.getuid()) % 2 ** 14
port = 2 ** 15 + 2 ** 14 + hash("User_name") % 2 ** 14
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash("User_name") is a fix value, please don't do that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fixed value i know, but i think is impossible to train on 8-GPU windows machine, I will find a way to get uid on windows.

@@ -334,7 +338,7 @@ at::Tensor ROIAlign_forward_cuda(
auto output_size = num_rois * pooled_height * pooled_width * channels;
cudaStream_t stream = at::cuda::getCurrentCUDAStream();

dim3 grid(std::min(at::cuda::ATenCeilDiv(output_size, 512L), 4096L));
dim3 grid(std::min(ceil_div((int)output_size, 512), 4096));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::cuda::ATenCeilDiv works for all platform, the real reason for not working on windows is 'L'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it and try to recompile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i remove "L", could this function run correctly on linux? Could i just simple "L"?

@@ -390,7 +394,7 @@ at::Tensor ROIAlign_backward_cuda(

cudaStream_t stream = at::cuda::getCurrentCUDAStream();

dim3 grid(std::min(at::cuda::ATenCeilDiv(grad.numel(), 512L), 4096L));
dim3 grid(std::min(ceil_div((int)grad.numel(), 512), 4096));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as last one.

@@ -52,7 +52,7 @@
SOLVER=dict(
OPTIMIZER=dict(
NAME="SGD",
BASE_LR=0.02,
BASE_LR=0.002,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not change this, thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.02 is too bigger for one GPU, i will change back it.

@@ -0,0 +1,126 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a file is duplicated with tools/train_net.py, or you should consider combine them together

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try use the same train way as on linux

Copy link
Owner

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wang-zipeng
Copy link
Author

PTAL @Wang-zipeng

i just search PTAL's mean by google.

@@ -66,7 +67,7 @@ def default_argument_parser():
# PyTorch still may leave orphan processes in multi-gpu training.
# Therefore we use a deterministic way to obtain port,
# so that users are aware of orphan processes by seeing the port occupied.
port = 2 ** 15 + 2 ** 14 + hash(os.getuid()) % 2 ** 14
port = 2 ** 15 + 2 ** 14 + hash(getuser()) % 2 ** 14
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
port = 2 ** 15 + 2 ** 14 + hash(getuser()) % 2 ** 14
port = 2 ** 15 + 2 ** 14 + hash(os.getuid() if sys.platform != "win32" else 1) % 2 ** 14

@@ -334,7 +338,7 @@ at::Tensor ROIAlign_forward_cuda(
auto output_size = num_rois * pooled_height * pooled_width * channels;
cudaStream_t stream = at::cuda::getCurrentCUDAStream();

dim3 grid(std::min(at::cuda::ATenCeilDiv(output_size, 512L), 4096L));
dim3 grid(std::min(at::cuda::ATenCeilDiv(static_cast<int64_t>(output_size), static_cast<int64_t>(512)), static_cast<int64_t>(4096)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to break this long line of code.

@@ -390,7 +394,7 @@ at::Tensor ROIAlign_backward_cuda(

cudaStream_t stream = at::cuda::getCurrentCUDAStream();

dim3 grid(std::min(at::cuda::ATenCeilDiv(grad.numel(), 512L), 4096L));
dim3 grid(std::min(at::cuda::ATenCeilDiv(static_cast<int64_t>(grad.numel()), static_cast<int64_t>(512)), static_cast<int64_t>(4096)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

setup.py Outdated
@@ -39,6 +41,8 @@ def get_extensions():
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]
if "Windows" == os_name:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sys.platform suitable for your case?

if eval_space_Gb > free_space_Gb:
logger.warning(f"{Fore.RED}Remaining space({free_space_Gb}GB) "
f"is less than ({eval_space_Gb}GB){Style.RESET_ALL}")
if "Linux" == platform.system():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "Linux" == platform.system():
if sys.platform == "linux":

Copy link
Owner

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that Python is not C++, code like

if a = 1

is invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants