-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine gpu kernel config for Paddle #28085
refine gpu kernel config for Paddle #28085
Conversation
Thanks for your contribution! |
@@ -1,49 +1,103 @@ | |||
/* Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. | |||
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件也不是这个PR新增的,copyright年限不用改吧?另外,不要改文件权限。
return config; | ||
} | ||
|
||
// 3D will add later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a todo and your github id will be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已添加
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block数、thread数可能都有变化,PR描述里面说明一下 具体的变化情况。再测一下所有op的性能变化。
KeBicubicInterpFw< | ||
T><<<config.blocks, 512, 0, ctx.cuda_device_context().stream()>>>( | ||
KeBicubicInterpFw<T><<<config.block_per_grid, 512, 0, | ||
ctx.cuda_device_context().stream()>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个有线程数调整,最好用op benchmark测下对OP性能的影响。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里和原来的逻辑是一致的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那这个逻辑有些奇怪,而且,可以改成1024?
@@ -1,49 +1,103 @@ | |||
/* Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. | |||
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件也不是这个PR新增的,copyright年限不用改吧?另外,不要改文件权限。
PR types
Performance optimization
PR changes
APIs
Describe
统一gpu block 和grid 设置(现存两种设置方式)