Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tempfile for storing temporary checkpoints/params/models #42681

Closed
zlsh80826 opened this issue May 11, 2022 · 16 comments
Closed

Use tempfile for storing temporary checkpoints/params/models #42681

zlsh80826 opened this issue May 11, 2022 · 16 comments
Assignees
Labels

Comments

@zlsh80826
Copy link
Collaborator

需求描述 Feature Description

目前 Paddle 實現的多個單測中, 會有儲存 checkpoints 或 inference model 後再讀取的測試, 這類型的測試會將模型儲存, 並在完成單測後刪除。但是多個單測會有暫存檔案同名的問題, 會導致單測同時運行時互相覆蓋掉其他單測的暫存檔案, 導致單測會出現隨機性的失敗。例如: test_datasettest_dataset_consistency_inspection 會使用相同 test_run_with_dump_a.txt 作為暫存檔案,又比如 test_resnettest_build_strategy 都使用相同的 resnet.dygraph.params 作為模型暫存檔。

在撰寫單測時應該注重單測的孤立性, 單測前後系統的狀態需要盡量保持相同, 最大限度地減少生成檔案或是測試之間的依賴關係, 如果有需要使用暫存檔案, 可以使用 mktemp 或是 python 模塊的 tempfile, 並在單測的 tearDown 函式中釋放從系統中取得的資源。

替代实现 Alternatives

有用到暫存檔案的單測應該參考類似 test_pass_builder.py 的寫法, 使用 tempfile 模塊底下的函式, 照單測的需求使用對應建立暫存檔案或目錄的 API 來減少單測間的影響。

@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@zlsh80826
Copy link
Collaborator Author

#42626 fixes part of the unit tests, but there are still many unit tests should be refactored.

@jzhang533
Copy link
Contributor

cc: @kolinwei

@Aurelius84
Copy link
Contributor

dygraph_to_static目录下的所有单测均已替换为tmpfile来管理,已cherry-pick到 2.3 分支,见:#42860

@zlsh80826
Copy link
Collaborator Author

Thank @Aurelius84, we will check it ASAP.

@luotao1
Copy link
Contributor

luotao1 commented Jul 29, 2022

we will check it ASAP.

@zlsh80826 How about the check result, could we close this issue?

@zlsh80826
Copy link
Collaborator Author

@luotao1
Copy link
Contributor

luotao1 commented Dec 7, 2022

@jiweibo Could you fix the test_convert_to_mixed_precision.py?

@jiweibo
Copy link
Contributor

jiweibo commented Dec 7, 2022

@jiweibo Could you fix the test_convert_to_mixed_precision.py?

#48819 will fix the problem.

@zlsh80826
Copy link
Collaborator Author

Hi @luotao1,
dist_pass_test_base.py doesn't use tempfile for temporary files, too. Besides, there is a flaw in dist_pass_test_base.py#L193, the mode argument of os.makedirs should be represent as oct format (reference), which should be 0o777 instead of 777. The incorrect mode=777 will be interpreted as 0o1411 and it will cause permission denied for non-root user. Thanks

@sneaxiy
Copy link
Collaborator

sneaxiy commented Dec 7, 2022

Hi @luotao1, dist_pass_test_base.py doesn't use tempfile for temporary files, too. Besides, there is a flaw in dist_pass_test_base.py#L193, the mode argument of os.makedirs should be represent as oct format (reference), which should be 0o777 instead of 777. The incorrect mode=777 will be interpreted as 0o1411 and it will cause permission denied for non-root user. Thanks

@zlsh80826 Thanks for pointing out. We would fix this issue.

@zlsh80826
Copy link
Collaborator Author

Thank all of you,
Once test_dist_pass_test_base and test_convert_to_mixed_precision are solved, I will close the issue.

@sneaxiy
Copy link
Collaborator

sneaxiy commented Dec 8, 2022

Hi @luotao1, dist_pass_test_base.py doesn't use tempfile for temporary files, too. Besides, there is a flaw in dist_pass_test_base.py#L193, the mode argument of os.makedirs should be represent as oct format (reference), which should be 0o777 instead of 777. The incorrect mode=777 will be interpreted as 0o1411 and it will cause permission denied for non-root user. Thanks

@zlsh80826 Thanks for pointing out. We would fix this issue.

@zlsh80826 This is fixed by #48863 .

@zlsh80826
Copy link
Collaborator Author

@sneaxiy appreciate your help!

@luotao1
Copy link
Contributor

luotao1 commented Dec 12, 2022

Once test_dist_pass_test_base and test_convert_to_mixed_precision are solved, I will close the issue.

@zlsh80826
Copy link
Collaborator Author

Sure, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants