Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I run ``code documentation'' reported in the paper? #34

Closed
chenqiuyuan opened this issue Jan 14, 2020 · 5 comments
Closed

How can I run ``code documentation'' reported in the paper? #34

chenqiuyuan opened this issue Jan 14, 2020 · 5 comments

Comments

@chenqiuyuan
Copy link

Hi, I want to use the model to conduct a ``Code Documentation'' task, which requires pairs of source code and a brief sentence (instead of subtokens of method name). But it is hard to modify the preprocessing pipeline. So can you help me to achieve this? In my opinion, just replace the method name with the sentence is enough.

@urialon
Copy link
Contributor

urialon commented Jan 14, 2020

Hi,
Yes, code documentation is possible if you just replace the method name with the sentence.
Sentences are typically longer than method names, so you might also want to try the hyperparameters here: #17 (comment).

See also: https://github.com/tech-srl/code2seq#extending-to-other-languages for an explanation about the required format.

Best,
Uri

@chenqiuyuan
Copy link
Author

Thanks for your prompt reply! I found a workaround since it is hard for me to modify the preprocess.sh and preprocess.py. My approach is as follows: for example, I use the preprocess.sh to get train.c2s, then I use the method name to find the method and the corresponding comment. Then I use the comment (i.e., the summary) to replace the method name. Am I right if I do in this way?

@chenqiuyuan
Copy link
Author

Thanks for your prompt reply! I found a workaround since it is hard for me to modify the preprocess.sh and preprocess.py. My approach is as follows: for example, I use the preprocess.sh to get train.c2s, then I use the method name to find the method and the corresponding comment. Then I use the comment (i.e., the summary) to replace the method name. Am I right if I do it in this way?

The difficulty is to locate accurate methods as there are many duplicate method names. So the replacement may be inaccurate. I'll try it first.

@urialon
Copy link
Contributor

urialon commented Jan 14, 2020

This is right, it won't work if you have duplicate method names.
I recommend to change the JavaExtractor to output documentation instead of method names, and this way you won't need to change preprocess.{sh,py} at all.
This is a good place to start: https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/Visitors/FunctionVisitor.java#L29

@chenqiuyuan
Copy link
Author

Thanks very much! I will try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants