Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using unified external error message for cufft api #36114

Merged
merged 1 commit into from
Oct 11, 2021

Conversation

cxxly
Copy link
Contributor

@cxxly cxxly commented Sep 26, 2021

PR types

Others

PR changes

APIs

Describe

Using unified external error message for CUFFT api, same to CURAND、CUDNN、CUBLAS、CUSOLVER、NCCL. Related pr:
#33003

Example error message:

OSError: (External) CUFFT error(4). 
  [Hint: 'CUFFT_INVALID_VALUE'. User specified an invalid pointer or parameter] (at /***/workspace/paddle/paddle/fluid/operators/spectral_op.cu:86)
  [operator < fft_c2r > error]

The error message is crawled from website https://docs.nvidia.com/cuda/cufft/index.html#cufftresult. If crawling failed, than will show raw website link in exception.

OSError: (External) CUFFT error(4). 
  [Hint: Please search for the error code(4) on website (https://docs.nvidia.com/cuda/cufft/index.html#cufftresult) to get Nvidia's official solution and advice about CUFFT Error.] (at /***/workspace/paddle/paddle/fluid/operators/spectral_op.cu:86)
  [operator < fft_c2r > error]

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhwesky2010
zhwesky2010 previously approved these changes Sep 26, 2021
Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

iclementine
iclementine previously approved these changes Sep 26, 2021
Copy link

@iclementine iclementine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Sep 27, 2021

由于修改externalErrorMsg.tar.gz时可能有一些风险,之前文档我写的不太详细:

辛苦在这个PR里也改一下:

  • 在start.sh 里压缩时自动生成当天日期后缀,例如 tar czvf externalErrorMsg_09272021.tar.gz externalErrorMsg.pb

  • 在README.md里加一下修改步骤,大概描述为:

  1. 修改spider.py,重新生成externalErrorMsg_09272021.tar.gz压缩文件,得到新的MD5值
  2. 上传新的externalErrorMsg_09272021.tar.gz压缩文件到https://paddlepaddledeps.bj.bcebos.com/ 中的paddlepaddledeps bucket,切勿删除旧的externalErrorMsg.tar.gz文件
  3. 提交PR,同时修改third_party.cmake中的file_download_and_uncompress(${URL} "externalError")中的URL和MD5值
  4. PR通过CI后merge,报错信息会更新为最新版本

@cxxly cxxly force-pushed the unify-error-message branch 2 times, most recently from 8e81136 to 414981a Compare September 28, 2021 03:37
Copy link

@iclementine iclementine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for op benchmark ci

@iclementine iclementine merged commit 642aaa2 into PaddlePaddle:develop Oct 11, 2021
@ZHUI
Copy link
Collaborator

ZHUI commented Oct 19, 2021

由于修改externalErrorMsg.tar.gz时可能有一些风险,之前文档我写的不太详细:

辛苦在这个PR里也改一下:

  • 在start.sh 里压缩时自动生成当天日期后缀,例如 tar czvf externalErrorMsg_09272021.tar.gz externalErrorMsg.pb
  • 在README.md里加一下修改步骤,大概描述为:
  1. 修改spider.py,重新生成externalErrorMsg_09272021.tar.gz压缩文件,得到新的MD5值
  2. 上传新的externalErrorMsg_09272021.tar.gz压缩文件到https://paddlepaddledeps.bj.bcebos.com/ 中的paddlepaddledeps bucket,切勿删除旧的externalErrorMsg.tar.gz文件
  3. 提交PR,同时修改third_party.cmake中的file_download_and_uncompress(${URL} "externalError")中的URL和MD5值
  4. PR通过CI后merge,报错信息会更新为最新版本

已经遇到此情况了,这个 PR #36126 和 本PR之间版本的代码是无法通过编译的。

-- verifying file...
       file='/home/workspaces/Paddle/build/third_party/externalError/data//externalErrorMsg.tar.gz'
-- MD5 hash of
    /home/workspaces/Paddle/build/third_party/externalError/data//externalErrorMsg.tar.gz
  does not match expected value
    expected: 'c0749523ebb536eb7382487d645d9cd4'
      actual: '061f3b7895aadcbe2c3ed592590f8b10'
-- Hash mismatch, removing...
CMake Error at src/download_externalError-stamp/download-download_externalError.cmake:159 (message):
  Each download failed!

cxxly added a commit to cxxly/Paddle that referenced this pull request Oct 19, 2021
XiaoguangHu01 pushed a commit that referenced this pull request Oct 28, 2021
* update fft api path (#36219)

* update fft api path
* add sample code for ihfft2

Co-authored-by: chenfeiyu <chenfeiyu@baidu.com>

* fix fft axis (#36321)

fix: `-1` is used when fft's axis is `0`

* use unified external error message for cufft api (#36114)

* fft: modify sample code result (#36325)

* dynamic load mkl as a fft backend when it is avaialble and requested (#36414)

* add rocm support for fft api (#36415)

* move signal apis

* move fft and signal API path (#2)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos in signal.py (#3)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* disable Cache when CUFFT_VERSION >= 10200 (#4)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* Add LRUCache for fft plans

* add LRUCache for cuff and hipfft (#5)

* move signal apis

* move fft.py and signal.py to paddle/, fix typos

* fix relative imports from fft.py and signal.py

* fix typos

* WIP: add cache

* delete move constructor and operator= for CuFFTHandle and FFTConfig

* remove log from CuFFTHandle and FFTConfig

* add lrucache for fft rocm backend

* disable LRUCache when CUFFT_VERSION >= 10200

* disbale copy and move for hipFFTHandle; format code

Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>

* remove debug message of cufftHandler

* roll_op: support Tensor as input for shifts (#36727)

* fix fftshift/ifftshift on static mode

* update roll_op version

* add more test cases for fftshift/ifftshift

Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com>
Co-authored-by: chenfeiyu <chenfeiyu@baidu.com>
Co-authored-by: LJQ❤️ <33169170+lijiaqi0612@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants