Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto set cpu env when mkldnn or mklml enabled #5671

Merged
merged 5 commits into from
Nov 16, 2017

Conversation

tensor-tang
Copy link
Contributor

@tensor-tang tensor-tang commented Nov 15, 2017

related #5280

When MKL-DNN or MKLML enabled,
auto set OMP_DYNAMIC and KMP_AFFINITY according to HT status.
auto set OMP_NUM_THREADS and MKL_NUM_THREADS according to total available processors and trainer_count

  • V1 API
  • V2 API

@tensor-tang tensor-tang changed the title auto set cpu env when mkldnn or mklml enabled for V1 API auto set cpu env when mkldnn or mklml enabled Nov 15, 2017
@tensor-tang tensor-tang requested a review from luotao1 November 15, 2017 12:54
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我将这个PR拉下来,在本地跑run_mkldnn.sh脚本。
然后 echo ${OMP_NUM_THREADS},打印值为空。想问下是什么原因呢?

processors = int(processors.read())
trainers = kwargs.get('trainer_count', 1)
threads = processors / trainers
threads = '1' if threads < 1 else str(threads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 会存在trainer_count不存在的情况么?
  • threads < 1说明processors是0?这种情况存在么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这里已经考虑了trainer_count不存在的情况,kwargs.get('trainer_count', 1), 默认为1.

  • 如果存在某些用户写的trainer_count为8, 但是他并不知道系统只有4个processors ,那么这里算出来的threads为0,就错了,所以给一个最小值,防止用户的错误输入。

fi
if [ -z "$OMP_DYNAMIC" ]; then
export OMP_DYNAMIC="FALSE"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

54-58行写成下面这种可以么,不需要加上判断。看v2 api那块就是直接设置的:

export KMP_AFFINITY="granularity=fine,compact,0,0"
export OMP_DYNAMIC="FALSE"

61-65行同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加上判断是有原因的,如果写死了,外面的环境变量的值就传不进来了。

@@ -14,8 +12,6 @@ function train() {
elif [ $4 == "False" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用mkldnn的时候,下面两个变量也是这样设置的么?从v1和v2的设置中看到是被统一了。

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1

Copy link
Contributor Author

@tensor-tang tensor-tang Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用mkldnn的时候,这个设置的操作统一放到内部去实现了,但值不是1。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问这个设置的操作在哪里实现了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unset OMP_NUM_THREADS MKL_NUM_THREADS
export OMP_DYNAMIC="FALSE"
export KMP_AFFINITY="granularity=fine,compact,0,0"
unset OMP_NUM_THREADS MKL_NUM_THREADS OMP_DYNAMIC KMP_AFFINITY
topology=$1
layer_num=$2
bs=$3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_mkldnn.sh脚本看,使用mkldnn的时候,trainer_count设为1就行?这点是不是也要告诉用户呢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以告诉用户,使用mkldnn就不需要设置trainer_count了,默认为1即可。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那这步是准备写入文档?还是写入代码呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前还没有加到code里面,不过可以加一个提醒用户。

Copy link
Contributor Author

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是为空的,这个变量的生命周期只在运行paddle里面,结束后是打不出来值的,除非在运行过程中打出来才会有值。

如果想看的值,可以在paddle那个脚本里面加echo 即可。但是一般用户可能也不需要看到这个,所以我没加echo。

unset OMP_NUM_THREADS MKL_NUM_THREADS
export OMP_DYNAMIC="FALSE"
export KMP_AFFINITY="granularity=fine,compact,0,0"
unset OMP_NUM_THREADS MKL_NUM_THREADS OMP_DYNAMIC KMP_AFFINITY
topology=$1
layer_num=$2
bs=$3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以告诉用户,使用mkldnn就不需要设置trainer_count了,默认为1即可。

@@ -14,8 +12,6 @@ function train() {
elif [ $4 == "False" ]; then
Copy link
Contributor Author

@tensor-tang tensor-tang Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用mkldnn的时候,这个设置的操作统一放到内部去实现了,但值不是1。

fi
if [ -z "$OMP_DYNAMIC" ]; then
export OMP_DYNAMIC="FALSE"
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加上判断是有原因的,如果写死了,外面的环境变量的值就传不进来了。

processors = int(processors.read())
trainers = kwargs.get('trainer_count', 1)
threads = processors / trainers
threads = '1' if threads < 1 else str(threads)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这里已经考虑了trainer_count不存在的情况,kwargs.get('trainer_count', 1), 默认为1.

  • 如果存在某些用户写的trainer_count为8, 但是他并不知道系统只有4个processors ,那么这里算出来的threads为0,就错了,所以给一个最小值,防止用户的错误输入。

unset OMP_NUM_THREADS MKL_NUM_THREADS
export OMP_DYNAMIC="FALSE"
export KMP_AFFINITY="granularity=fine,compact,0,0"
unset OMP_NUM_THREADS MKL_NUM_THREADS OMP_DYNAMIC KMP_AFFINITY
topology=$1
layer_num=$2
bs=$3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那这步是准备写入文档?还是写入代码呢?

@@ -14,8 +12,6 @@ function train() {
elif [ $4 == "False" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问这个设置的操作在哪里实现了?

set_env("KMP_AFFINITY", "granularity=fine,compact,0,0")
else:
set_env("OMP_DYNAMIC", "true")
set_env("KMP_AFFINITY", "granularity=fine,compact,1,0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 api的时候没写死,v2 api的时候写死了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2的时候也没有写死,定义了一个set_env

@luotao1
Copy link
Contributor

luotao1 commented Nov 16, 2017

如果想看的值,可以在paddle那个脚本里面加echo 即可。但是一般用户可能也不需要看到这个,所以我没加echo。

可以加一下打印的函数么?注释掉即可。

@tensor-tang
Copy link
Contributor Author

可以, 我加一个输出完整的配置结果,就先放在v1的API那里好了。

@tensor-tang
Copy link
Contributor Author

Done.

编译之前把注释去掉就可以了。

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM。在下一个PR中,使用MKLDNN的时候,提醒用户trainer_count默认为1.

@luotao1 luotao1 merged commit 6cf7f1e into PaddlePaddle:develop Nov 16, 2017
@tensor-tang tensor-tang deleted the autocpu branch November 16, 2017 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants