-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Add]将 xdoctest 引入到飞桨框架工作流中 #540
Conversation
另外:
还请帮忙指导一下,谢谢! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
很棒的 RFC!不过有些细节需要稍微调整下~
|提交作者 | megemini (柳顺) | | ||
|提交时间 | 2023-05-21 | | ||
|版本号 | V1.0 | | ||
|依赖飞桨版本 | paddlepaddle>2.4 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该在 develop 分支上开发
|
||
### 2.1 文档建设 | ||
|
||
更新 Paddle 贡献指南中的文档: [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API 文档书写规范 也应同步修改~
|
||
更新 Paddle 贡献指南中的文档: [开发 API Python 端](https://www.paddlepaddle.org.cn/documentation/docs/zh/dev_guides/api_contributing_guides/new_python_api_cn.html#api-python) 。以此规范后续代码的开发。 | ||
|
||
添加 `Example` 示例代码的写作要求,要求符合 `xdoctest` 中的 `google` style,即,在示例 `Example` 中代码需要以 `>>>` 开头。且保留目前的 `code-block` 提示,从而不影响中文文档的生成工作。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在示例
Example
中代码需要以>>>
开头。且保留目前的code-block
提示
很不错的方案~不过需要确认下,带有 code-block
这种方式是兼容 xdoctest 的嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
具体 xdoctest 的源码 parser.py 我只是大体看了一下,目前咱们的 .. code-block:: python 在 xdoctest 应该是当作 TEXT 来处理的,所以没啥影响。
用一个简单的例子可以验证一下:
def test(a):
"""this is docstring...
Examples:
.. code-block:: python
this is a test...
>>> a = 3
>>> print(a)
3
"""
pass
得到结果是可以的:
$ xdoctest --style=google test_simple.py
=====================================
_ _ ___ ____ ____ ___ ____ ____ ___
\/ | \ | | | | |___ [__ |
_/\_ |__/ |__| |___ | |___ ___] |
=====================================
Start doctest_module('test_simple.py')
Listing tests
gathering tests
running 1 test(s)
====== <exec> ======
* DOCTEST : test_simple.py::test:0, line 5 <- wrt source file
DOCTEST SOURCE
6 >>> a = 3
7 >>> print(a)
3
DOCTEST STDOUT/STDERR
3
DOCTEST RESULT
* SUCCESS: test_simple.py::test:0
====== </exec> ======
============
=== 1 passed in 0.09 seconds ===
|
||
Paddle 代码的 CI 流水线相关工具放置在 [Paddle/tools/](https://github.com/PaddlePaddle/Paddle/tree/develop/tools) 目录下。 | ||
|
||
目前对于 python 示例代码的检查,主要通过 [Paddle/tools/codestyle/docstring_checker.py](https://github.com/PaddlePaddle/Paddle/blob/develop/tools/codestyle/docstring_checker.py) 完成。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是 paddle/tools/sampcd_processor.py
吧?关于 docstring_checker,是一个没有起作用的工具,可参见 PaddlePaddle/Paddle#47821
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,这个我再具体看看然后改一下~
print("Sample code check is successful!") | ||
``` | ||
|
||
此方法存在较多问题,比如,无法验证代码与示例中的结果是否一致,无法处理本应报错的示例代码等。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无法处理本应报错的示例代码
这是指?报错的示例代码现阶段应该会在 CI 中报错的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xdoctest 可以捕获 Error 的输出进行检查:
def test(a):
"""this is docstring...
Examples:
.. code-block:: python
this is a test...
>>> raise ValueError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError
"""
pass
执行 xdoctest :
$ xdoctest --style=google test_error.py
=====================================
_ _ ___ ____ ____ ___ ____ ____ ___
\/ | \ | | | | |___ [__ |
_/\_ |__/ |__| |___ | |___ ___] |
=====================================
Start doctest_module('test_error.py')
Listing tests
gathering tests
running 1 test(s)
====== <exec> ======
* DOCTEST : test_error.py::test:0, line 5 <- wrt source file
DOCTEST SOURCE
6 >>> raise ValueError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError
DOCTEST STDOUT/STDERR
DOCTEST RESULT
* SUCCESS: test_error.py::test:0
====== </exec> ======
============
=== 1 passed in 0.09 seconds ===
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
了解~
|
||
目前 Paddle 中 python 相关代码,主要放置在 [Paddle/python/paddle/](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle) 目录下。 | ||
|
||
其中包括 `2334` 个 python 文件,包括示例代码 `341` 段。(commit `8acbf10bd51026c0a41423c2826b7cc886ad1e74`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
包括示例代码
341
段
这里的统计来源是?只有 341 个示例代码嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我这里简单改了一下 docs/ci_scripts/chinese_samplecode_processor.py 进行统计:
import math
import os
import pickle
import shutil
import subprocess
import multiprocessing
import sys
import glob
def remove_desc_code(srcls, filename):
if filename == 'fluid_cn/one_hot_cn.rst':
srcls.pop(13)
srcls.pop(28)
srcls.pop(44)
if filename == 'layers_cn/one_hot_cn.rst':
srcls.pop(15)
srcls.pop(30)
srcls.pop(46)
if filename == 'profiler_cn/profiler_cn.rst':
srcls.pop(41)
if filename == 'layers_cn/natural_exp_decay_cn.rst':
srcls.pop(13)
if filename == 'layers_cn/transpose_cn.rst':
srcls.pop(20)
if filename == 'layers_cn/array_length_cn.rst':
srcls.pop(36)
if filename == 'layers_cn/inverse_time_decay_cn.rst':
srcls.pop(13)
if filename == 'layers_cn/stack_cn.rst':
srcls.pop(12)
srcls.pop(33)
if filename == 'layers_cn/sums_cn.rst':
srcls.pop(11)
if filename == 'layers_cn/sum_cn.rst':
for i in range(len(srcls) - 1, 61, -1):
srcls.pop(i)
if filename == 'layers_cn/softmax_cn.rst':
srcls.pop(30)
srcls.pop(57)
if filename == 'layers_cn/array_write_cn.rst':
srcls.pop(37)
if filename == 'layers_cn/lod_append_cn.rst':
srcls.pop(11)
if filename == 'layers_cn/reorder_lod_tensor_by_rank_cn.rst':
srcls.pop(25)
if filename == 'layers_cn/round_cn.rst':
srcls.pop(10)
if filename == 'layers_cn/squeeze_cn.rst':
srcls.pop(11)
srcls.pop(19)
srcls.pop(27)
if filename == 'layers_cn/unsqueeze_cn.rst':
srcls.pop(11)
if filename == 'layers_cn/array_read_cn.rst':
srcls.pop(51)
if filename == 'layers_cn/scatter_cn.rst':
srcls.pop(9)
if filename == 'layers_cn/topk_cn.rst':
srcls.pop(11)
if filename == 'optimizer_cn/ModelAverage_cn.rst':
srcls.pop(15)
return srcls
def check_indent(code_line):
indent = ""
for c in code_line:
if c == '\t':
indent += ' '
elif c == ' ':
indent += ' '
if c != ' ' and c != '\t':
break
return indent
def find_all(src_str, substr):
indices = []
get_one = src_str.find(substr)
while get_one != -1:
indices.append(get_one)
get_one = src_str.find(substr, get_one + 1)
return indices
def extract_sample_code(srcfile, status_all):
content = ""
filename = srcfile.name
srcc = srcfile.read()
srcfile.seek(0, 0)
srcls = srcfile.readlines()
srcls = remove_desc_code(
srcls, filename
) # remove description info for samplecode
status = []
sample_code_begins = find_all(srcc, " code-block:: python")
if len(sample_code_begins) == 0:
status.append(-1)
else:
for i in range(0, len(srcls)):
if srcls[i].find(".. code-block:: python") != -1:
content = ""
start = i
blank_line = 1
while srcls[start + blank_line].strip() == '':
blank_line += 1
startindent = ""
# remove indent error
if srcls[start + blank_line].find("from") != -1:
startindent += srcls[start + blank_line][
: srcls[start + blank_line].find("from")
]
elif srcls[start + blank_line].find("import") != -1:
startindent += srcls[start + blank_line][
: srcls[start + blank_line].find("import")
]
else:
startindent += check_indent(srcls[start + blank_line])
content += srcls[start + blank_line][len(startindent) :]
for j in range(start + blank_line + 1, len(srcls)):
# planish a blank line
if (
not srcls[j].startswith(startindent)
and srcls[j] != '\n'
):
break
if srcls[j].find(" code-block:: python") != -1:
break
content += srcls[j].replace(startindent, "", 1)
status.append(run_sample_code(content, filename))
status_all[filename] = status
return status_all, content
def run_sample_code(content, filename):
return 0
def test(file):
temp = []
src = open(file, 'r')
status_all = {}
_, content = extract_sample_code(src, status_all)
temp.append(status_all)
src.close()
return temp, content
if __name__ == '__main__':
with open('codes.txt', 'w') as f_codes:
codes = []
count = 0
count_codes = 0
for root, dirs, files in os.walk('/home/shun/Documents/Projects/paddle_xdoctest/Paddle-develop/python/paddle'):
# print("当前目录:", root)
# print("子目录列表:", dirs)
# print("文件列表:", files)
for f in files:
if f.endswith('.py'):
count += 1
filename = os.path.join(root, f)
_, _codes = test(filename)
if _codes:
count_codes += 1
f_codes.write('-'*30 + str(count_codes))
f_codes.write('\n')
f_codes.write(filename + '\t' + '-'*30)
f_codes.write('\n')
f_codes.write(_codes)
f_codes.write('\n')
print('total...', count)
print('total code...', count_codes)
这里抽出来就这么多,我感觉也有点少,不过 python 的文件数好像也对 就没深究了 呵呵 。。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以试一下 Paddle 下的脚本 paddle/tools/sampcd_processor.py
|
||
3. 后期收尾阶段:切换流水线至 Paddle 代码中,可移除 Paddle docs 的代码检查。 | ||
- 中英文 [API 文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html#api) 特性更新,可以复制带有 `>>>` 提示符的代码示例,包含代码与注释,不含输出。 | ||
- 代码检查移交(可选),将代码检查的工作全部从 Paddle docs 移交至 Paddle 代码的 CI 流水线中进行。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
由于前面所述,Paddle 和 docs 是同时包含代码检查的,这里的一些表述需要修改下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码检查移交(可选)
我觉得「可选」可以删掉,因为同时使用两个工具来检查会徒增维护成本,该阶段可以移除原有的代码检查
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以复制带有
>>>
提示符的代码示例,包含代码与注释,不含输出。
这个前中期的代码复制是如何保证的呢?用户在前中期看到、复制的代码是包含 >>>
和注释的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方没写详细~
目前 docs 是用 sphinx 构建的吧?模板是不是在 templates_path = ["/templates"] 下面?
我还真没用过 sphinx 构建过文档,不确定前中期看到和复制的代码是什么样的,这个地方单独把这个特性拎出来也是为了跟踪一下。
- 后续行中没有 `>>>` 开头的语句视为输出,其上一行必须以 `>>>` 开头。 | ||
- 空行视为新的代码段开始 | ||
|
||
但是,由于 `xdoctest` 中也暂无此类强行的格式检查,所以,此设计项作为可选。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该阶段是否可以将 .. code-block::
及缩进移除呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以啊~ 如果确认不需要 .. code-block:: ,相应的需要修改 Paddle 代码和 Paddle docs 对于示例代码的抽取。
这样的话,建议单独拎一个特性出来~
不过,这里还是要确认一下,由于 xdoctest 对于目前的示例代码是 “兼容” 的,也就是会自动跳过,咱们后面是否需要强制检查这个格式?所以我这里把 2.3 不再兼容旧格式(可选)
列为了可选。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不过,这里还是要确认一下,由于 xdoctest 对于目前的示例代码是 “兼容” 的,也就是会自动跳过,咱们后面是否需要强制检查这个格式?所以我这里把 2.3 不再兼容旧格式(可选) 列为了可选。
如果没有检查的话,会有开发者因为使用了旧的格式而被跳过吧,这样相应的代码即便发生了错误也无法被检查出来了,这是不太能接受的,所以还是比较建议有这样的一个检查的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞同!:)
- 影响 Paddle 代码与 Paddle docs 的 CI 流水线 | ||
- 影响目前 python API 的示例代码写作方式 | ||
- 影响文档 `开发 API Python 端` 的页面显示 | ||
- 影响中英文 API 文档的示例代码显示与代码复制 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可否按照 https://github.com/PaddlePaddle/community/blob/master/rfcs/design_template.md#%E4%B8%83%E5%BD%B1%E5%93%8D%E9%9D%A2 分成几类来描述下呢?可以稍微展开说下影响有多大,是否可控
|
||
另外,对于无法验证输出一致性的示例(随机分布)、需要特殊环境(如需要GPU、文件存储)等均无特殊处理。 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外最好额外提一下,Paddle 现有的代码检查工具的原理是运行时抽取 docstring 还是静态代码分析?xdoctest 又是如何抽取的?
值得注意的是,运行时抽取有一个优势是即便是 C++ 代码中定义的 Docstring 也是可以正确抽取出来的,而静态代码分析则是不太容易做到的,这一点可以确定一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不是很理解,是用 xdoctest 抽取 c++ 中的例子?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
比如对于
是通过 pybind11 暴露的 API,其生成的文档见
这个 API 的示例代码现有的示例代码检查工具是可以检查的吗?xdoctest 是可以检查的吗?需要对比一下~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 xdoctest 可以动态解析:
analysis (str, default='auto'): if 'static', only static analysis is used to parse call definitions. If 'auto', uses dynamic analysis for compiled python extensions, but static analysis elsewhere, if 'dynamic', then dynamic analysis is used to parse all calldefs.
def parse_dynamic_calldefs(modpath_or_module):
...
if getattr(module, '__doc__'):
calldefs['__doc__'] = static.CallDefNode(
callname='__doc__',
docstr=module.__doc__,
lineno=0,
doclineno=1,
doclineno_end=1,
args=None
)
...
paddle.device.cuda.Stream.__doc__
我看能正常抽取出来,但是具体 xdoctest 怎么处理,这个要具体做的时候关注一下!我单独分一个特性出来跟踪吧~ :)
PR types
New features
PR changes
Docs
Describe
[used AI Studio]
中国软件开源创新大赛:飞桨框架任务挑战赛
@SigureMo @Ligoml
请评审!谢谢!