Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Adding test & train API to be used directly in code #1138

Merged
merged 10 commits into from
Sep 29, 2022

Conversation

wybryan
Copy link

@wybryan wybryan commented Jul 5, 2022

This changes added capability for test & training to be directly invoked
in code, e.g., inside a Jupyter notebook cell.

The change also ensures the original command-line usage remains the
same.

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

It is very common that data scientists use Jupyter notebook to author modelling work. It is desirable that experiments such as training and/or test can be called in code instead of typing in command-line terminal. This PR adds such capability without affecting existing command-line usage.

Modification

The modification is made with minimum changes in mind, it just adds a class to assemble argument list as list of string, and parse it into the argparser. The design ensures the following:

  1. the existing command-line user case remain as is.
  2. when user wants to initiate training/testing, the parameter parsing is IDENTICAL to the command-line user case.
  3. an example of how to use such modifications is shown as the following:

from mmocr.tools.train import TrainArg, parse_args, run_train_cmd
args = TrainArg(config='/path/to/config.py')
args.add_arg('--work-dir', '/path/to/dir')
args = parse_args(args.arg_list)
run_train_cmd(args)

BC-breaking (Optional)

No, it remains 100% backward compatibility

Use cases (Optional)

Allowing training experiments and testing experiments can be started directly in code such as Jupyter notebook.

Checklist

Before PR:

  • I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

This changes added capability for test & training to be directly invoked
in code, e.g., inside a Jupyter notebook cell.

The change also ensures the original command-line usage remains the
same.
@CLAassistant
Copy link

CLAassistant commented Jul 5, 2022

CLA assistant check
All committers have signed the CLA.

@gaotongxiao
Copy link
Collaborator

Thanks for your contribution! It is an appealing feature and is likely to be applied to all OpenMMLab projects. I'm looping in other colleagues to have some discussion on it.

@gaotongxiao
Copy link
Collaborator

And, please install precommit hooks following https://github.com/open-mmlab/mmocr/blob/main/.github/CONTRIBUTING.md#installing-pre-commit-hooks and format your code with pre-commit run --all-files to pass our lint tests.

@wybryan
Copy link
Author

wybryan commented Jul 7, 2022

And, please install precommit hooks following https://github.com/open-mmlab/mmocr/blob/main/.github/CONTRIBUTING.md#installing-pre-commit-hooks and format your code with pre-commit run --all-files to pass our lint tests.

thanks for the info, I've fixed the linting issue now.

Copy link
Author

@wybryan wybryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint has passed.

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Jul 18, 2022

This is a nice suggestion, but the design still has some room to improve.

  1. This design is not straightforward for multiple arguments. Users would have to run add_arg() multiple times.
  2. It unnecessarily exposes an intermediate step (parse_args) to users and requires them to call at least three processes sequentially to train/test a model.

Therefore, this design might compromise some user-friendliness. Referring to MMOCR() where CLI and object share exactly the same group of arguments, we can neatify its interface a bit. Consider this example:

from mmocr.tools.train import Trainer
trainer = Trainer(config='xxx', work_dir='xxx', no_validate=True)  # accept same group of arguments as CLI
trainer.add_args(launcher='pytorch', diff_seed=100)  # Optional, but useful sometimes
trainer.train() 

Does it look better?

@gaotongxiao
Copy link
Collaborator

BTW, this PR is inspiring and prompts us to design a better interface for notebook users. I'd like to share one potentially even better and unified proposal for some feedback though it's a bit out of this PR's scope. Currently MMOCR() object is just for demonstration. How about integrating every fundamental API into it:

from mmocr import MMOCR
mmocr = MMOCR(...)
mmocr.train(...)
mmocr.test(...)
mmocr.inference(...)  # alias of readtext()

Also seems applicable to all other OpenMMLab projects.

@wybryan
Copy link
Author

wybryan commented Jul 18, 2022

BTW, this PR is inspiring and prompts us to design a better interface for notebook users. I'd like to share one potentially even better and unified proposal for some feedback though it's a bit out of this PR's scope. Currently MMOCR() object is just for demonstration. How about integrating every fundamental API into it:

from mmocr import MMOCR
mmocr = MMOCR(...)
mmocr.train(...)
mmocr.test(...)
mmocr.inference(...)  # alias of readtext()

Also seems applicable to all other OpenMMLab projects.

I agree this is better API.

@wybryan
Copy link
Author

wybryan commented Jul 18, 2022

BTW, this PR is inspiring and prompts us to design a better interface for notebook users. I'd like to share one potentially even better and unified proposal for some feedback though it's a bit out of this PR's scope. Currently MMOCR() object is just for demonstration. How about integrating every fundamental API into it:

from mmocr import MMOCR
mmocr = MMOCR(...)
mmocr.train(...)
mmocr.test(...)
mmocr.inference(...)  # alias of readtext()

Also seems applicable to all other OpenMMLab projects.

I agree this is better API.

hopefully I can contribute to this proposal. I don't know how MMLab projects operate with each other, maybe we can implement this API with MMOCR first as a 'pilot' example.

Alternatively, a grand design can be carried out by implementing such API into mmcv project, which is the mother project for all other mmlab sub-project, but I guess this would need more sync with each other subprojects to conform with API.

@gaotongxiao
Copy link
Collaborator

hopefully I can contribute to this proposal. I don't know how MMLab projects operate with each other, maybe we can implement this API with MMOCR first as a 'pilot' example.

Alternatively, a grand design can be carried out by implementing such API into mmcv project, which is the mother project for all other mmlab sub-project, but I guess this would need more sync with each other subprojects to conform with API.

Great to hear that! Could you send an email to mmocr@openmmlab.com to join our Slack group? We can discuss more details there.

@wybryan
Copy link
Author

wybryan commented Jul 21, 2022

hopefully I can contribute to this proposal. I don't know how MMLab projects operate with each other, maybe we can implement this API with MMOCR first as a 'pilot' example.
Alternatively, a grand design can be carried out by implementing such API into mmcv project, which is the mother project for all other mmlab sub-project, but I guess this would need more sync with each other subprojects to conform with API.

Great to hear that! Could you send an email to mmocr@openmmlab.com to join our Slack group? We can discuss more details there.

cool, email sent, cheers.

@gaotongxiao
Copy link
Collaborator

Hi, sorry for coming back late. Now we finally have time to proceed with this PR after the release of 1.0.0rc0. Could you clean up your code a little bit and leave only the train&test API part in this PR?

@wybryan
Copy link
Author

wybryan commented Sep 26, 2022

Hi, sorry for coming back late. Now we finally have time to proceed with this PR after the release of 1.0.0rc0. Could you clean up your code a little bit and leave only the train&test API part in this PR?

what do you mean? you mean only keep changes made in test.py & train.py?

@gaotongxiao
Copy link
Collaborator

@wybryan Right, the changes of a PR should be kept within the scope as claimed in the title.

@wybryan
Copy link
Author

wybryan commented Sep 28, 2022

@wybryan Right, the changes of a PR should be kept within the scope as claimed in the title.

sure, I'll revert other changes, just keeping changes in train.py & test.py.

@gaotongxiao gaotongxiao merged commit b422ded into open-mmlab:main Sep 29, 2022
@yaqi0510
Copy link

yaqi0510 commented Apr 3, 2023

wybryan,您好!您在MMOCR项目中给我们提的PR非常重要,感谢您付出私人时间帮助改进开源项目,相信很多开发者会从你的PR中受益。
我们非常期待与您继续合作,OpenMMLab专门成立了贡献者组织MMSIG,为贡献者们提供开源证书、荣誉体系和专享好礼,可通过添加微信:openmmlabwx 联系我们(请备注mmsig+GitHub id),由衷希望您能加入!

Hi @wybryan !First of all, we want to express our gratitude for your significant PR in the MMOCR project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.

We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/raweFPmdzG

If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:)
Thank you again for your contribution❤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants