Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] 添加监测中文 apilabel 的 ci #6226

Merged
merged 29 commits into from
Oct 22, 2023

Conversation

ooooo-create
Copy link
Collaborator

link #6170

make a try
感觉像是一坨,先提一下

  1. api 文档本身 api_label 监测
  • check_api_label (判断是否正确)
  • test_ornot(是否是api文档)
  1. 引用的 api_label 正确性
  • traverse_api_label(遍历doc/api,寻找所有 api_label )
  • get_api_label_list (寻找一个文档中 的 api_label)
  1. 共同
  • pipline
  • en_label_creater

@sunzhongkai588 @SigureMo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉像是一坨,先提一下

没关系,咱代码风格可以慢慢锻炼,这个多写写就好了,而且还有我来把控~而且比某些写了一坨还不自知的人强太多了(这个 repo 能成这样,所有代码 reviewer 都有责任)

ci_scripts/check_api_label_cn.py Outdated Show resolved Hide resolved
ci_scripts/check_api_label_cn.py Outdated Show resolved Hide resolved
Comment on lines 17 to 19
result = re.sub("api/", "", file)
result = re.sub("_cn.rst", "", result)
result = re.sub('/', "_", result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是很建议使用 re.sub,更建议直接分别使用 removeprefix、removesuffix、replace 这三个,因为 sub 的话无法确定删除的是什么位置的,前两个函数好像 3.10 才有,需要确认 CI 是 3.10 环境(我记得之前改成 3.10 了)

result = re.sub("api/", "", file)
result = re.sub("_cn.rst", "", result)
result = re.sub('/', "_", result)
result = '.. _cn_' + result + ':'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = '.. _cn_' + result + ':'
result = f'.. _cn_{result}:'

字符串操作使用 f-string 可读性会非常好~

if (
file.endswith("_cn.rst")
and (file not in ["Overview_cn.rst", "index_cn.rst"])
and file.startswith("api")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码里若干处硬编码了 "api" 这个字符串,其实可以考虑怎么优化下,是否传递参数时不要 api,或者整个常量什么的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图片
传递这两个参数到文件,然后就定义一个常量 API 表示APIROOT相对DOCROOT目录的位置,可以嘛

and file.startswith("api")
):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-def fooooo():
-    if xxx:
-        return True
-    return False
+    return xxx

直接 return 就好呀

with open(rootdir + file, 'r', encoding='utf-8') as f:
pattern = f.read()
matches = re.findall(r":ref:`([^`]+)`", pattern)
if matches:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个判断是没必要的,因为没有结果的话 matches 是 [],对空 list 遍历就直接跳过了,可以减少一个缩进

Comment on lines 75 to 82
if match.startswith('cn'):
if match not in list:
print(rootdir + file, 'ref' + match, 'error')
else:
out = re.search("<(.*?)>", match)
if out and out.group(1).startswith("cn"):
if out.group(1) not in list:
print(rootdir + file, 'ref' + match, 'error')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个分支是可以合并的,其实主要就是从

:ref:`xxx`
:ref:`aaa <xxx>`

两种情况提取 xxx

可以

api_label = # 前面的 match
if api_label_match := re.match(r".+<(?P<api_label>.+?)>", api_label):
    api_label = api_label_match.group("api_label")
if api_label.startswith("cn_") and api_label not in valid_api_labels:
    logger.error(f"Found api label {api_label} in {rootdir}/{file}, but it is not a valid api label, please re-check it!")
    sys.exit(1)

整体缩进层级会少很多~



if __name__ == "__main__":
pipline(sys.argv[1], sys.argv[2:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

虽然整体很简单,但还是建议使用 argparse,参数语义会更清晰,可读性会更强

ci_scripts/check_api_label_cn.py Outdated Show resolved Hide resolved
ci_scripts/check_api_label_cn.py Outdated Show resolved Hide resolved
Comment on lines 77 to 81
if should_test(file):
if check_api_label(rootdir, file):
pass
else:
print("error:", file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if should_test(file):
if check_api_label(rootdir, file):
pass
else:
print("error:", file)
if should_test(file) and not check_api_label(rootdir, file):
sys.exit(1) # 这里是不是应该报错?而不是仅仅 print?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新commit加上了logger.error

Comment on lines 122 to 125
if len(sys.argv) == 2:
parser.print_help()
sys.exit(1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用了 argparse,这块是不是就可以不用了?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得这块是不用,当时是按照那个改的,现在看来两种情况不一样,要删掉
图片

@paddle-bot
Copy link

paddle-bot bot commented Oct 16, 2023

感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-6226.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
预览工具的更多说明,请参考:飞桨文档预览工具

@SigureMo
Copy link
Member

image

@ooooo-create 可以触发一下测试一下,现在 CI 好像没有跑?测试如果没啥问题(包括正确 case 正确通过和错误 case 正常报错),我觉得就没问题了

@ooooo-create
Copy link
Collaborator Author

image

@ooooo-create 可以触发一下测试一下,现在 CI 好像没有跑?测试如果没啥问题(包括正确 case 正确通过和错误 case 正常报错),我觉得就没问题了

好的,是在这个 pr 里,通过 修改文件 来测试吗

@SigureMo
Copy link
Member

好的,是在这个 pr 里,通过 修改文件 来测试吗

对的

@ooooo-create
Copy link
Collaborator Author

ooooo-create commented Oct 17, 2023

发现几个问题

  1. (已提)这个函数判定有问题,要判断是否以 ["Overview_cn.rst", "index_cn.rst"] 结尾
    1697533569871
  2. (已提)api_label存在数字,所以正则要进行修改
    1697535528337
  3. need_check_cn_doc_files 传入的文件都是 api目录下的,其他目录文件引用的 api_labl 就没有进行引用正确性检查
    是由这个函数获得的
    1697536352607

@@ -67,7 +67,8 @@ def find_api_labels_in_one_file(file_path):
def should_test(file):
return (
file.endswith("_cn.rst")
and (file not in ["Overview_cn.rst", "index_cn.rst"])
and not file.endswith("Overview_cn.rst")
and not file.endswith("index_cn.rs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥是 .rs?rust 代码么 [doge]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doge] t 走丢了,过会就回来

@ooooo-create
Copy link
Collaborator Author

@SigureMo 一师傅,t 回来了😎,然后我又改了一下别的~主要是 引用 api_lael 正确性包含了所有修改的文件

@SigureMo
Copy link
Member

主要是 引用 api_lael 正确性包含了所有修改的文件

然后这句话的 b 丢了 [doge]

你先测试着,如果测试都没问题了叫我

@ooooo-create
Copy link
Collaborator Author

@SigureMo 测试没有问题啦

  1. api 文件本身 api_label 错误会报错
  2. api 文件和其他目录的 .rst 文件引用 的 api_label 错误会报错
  3. 如果都没问题会提示 All api_label check success in PR !
  4. 主要commit修改下面两个

[doge] 这次应该没有字母丢😎😎

Comment on lines 27 to 28
print(first_line)
print(generate_en_label_by_path(file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两行是用来 debug 的么?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的~

)


def pipline(rootdir, files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline?

@@ -46,7 +46,7 @@
二、动态图操作实践
---------------------------

使用飞桨框架提供的 API:\ ``paddle.amp.auto_cast``\ 和\ ``paddle.amp.GradScaler``\ 能够实现动态图的自动混合精度训练,即在相关 OP 的计算中,自动选择 FP16 或 FP32 格式计算。开启 AMP 模式后,使用 FP16 与 FP32 进行计算的 OP 列表可以参见 :ref:`cn_overview_amp` 。
使用飞桨框架提供的 API:\ ``paddle.amp.auto_cast``\ 和\ ``paddle.amp.GradScaler``\ 能够实现动态图的自动混合精度训练,即在相关 OP 的计算中,自动选择 FP16 或 FP32 格式计算。开启 AMP 模式后,使用 FP16 与 FP32 进行计算的 OP 列表可以参见 :ref:`cn_api_paddle_geometric_reindex_grap` 。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件是用来测试还是?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试,没改回去~

@@ -74,7 +72,7 @@ def should_test(file):
)


def pipline(rootdir, files):
def run_cn_api_lable_checking(rootdir, files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label?:joy:

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

@SigureMo SigureMo merged commit f6a8d37 into PaddlePaddle:develop Oct 22, 2023
3 checks passed
@SigureMo SigureMo added the HappyOpenSource 快乐开源活动issue与PR label Oct 22, 2023
@ooooo-create ooooo-create deleted the ooooo/cn_api_ci branch October 25, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor HappyOpenSource 快乐开源活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants