-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, how could I reproduce results for code documentation as described in the paper #45
Comments
Hi @rishab-32 , |
Hi @urialon, Yes, it is the same I have been able to get the results for the code captioning task, thanks for your help. However, now I am trying to get the result for code documentation. From your previous comments. What I understand is
I am confused about step 3, do I have to run the script in step 3 on the train, test, and valid directories(containing .java files and align.txt) or the c2s files generated after preprocess.sh? And what is the next step I should follow after this? Thanks for your help, Best, |
Hi @rishab-32 , I didn't upload this pipeline officially because that dataset was problematic (if you take a look, you'll see that the NL labels contain many: " " ), and the paper's results were far from accurate. If I remember correctly, step3 should be used here: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L53
Afterward, run the following lines: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L58 Best, |
Hi @urialon, thanks a lot for being so helpful and cooperative. Yes, you are right. Even I have seen different results being reported for the same dataset. One last question, so the c2s files are arranged in the following form. method_name context. Right? where methodname acts as the target sequence while training. If I am able to replace the method_name(nonduplicates) with the natural language summary, would this be the right approach to train the model and report results, as I am using code2seq as one of the baseline to compare with my approach for the code documentation task? |
Sorry, I meant that many NL labels in the dataset contain: Regarding your question: Once you manage to train a model on this dataset, I recommend using these hyperparameters that are better suited for these long documentations (the default hyperparams are optimized for short method names):
Or use the same sizes/layers as your model. Additionally, set the max documentation length Best, |
Thanks a lot, @urialon for clearing things up and sure I would use the above mentioned recommended parameters. Best, |
Hi @urialon, while I was running the train.sh, I encountered the following error. could you suggest any possible change I need to make? tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. During handling of the above exception, another exception occurred: Traceback (most recent call last): Original stack trace for 'IteratorGetNext': |
I think that didn't run the
|
Thanks, Now everything is working fine. |
Hi, In step 3 and in order to insert the summary sequences instead of the method name, how I can map each method name to its summary using the data in step one and the data in step two (like from where I can get the method_id to do such matching). I'm a little confused about it. Thanks. -------------------- Update ----------------------- |
Hi, is it the right form?Thanks |
Hi @yingdehuijin ! If there are other questions, please create a new issue. |
Sorry for bothering you againg,after calling the JavaExtractor.It generates the raw.txt files including training/valid/test. The files are arranged in the following form: |
I'm not sure I understand. |
I used Hu's the newest datasets,it contain code and nl txt files.I use your scripts https://gist.github.com/urialon/6ce2ffab7b675d9437b730246dc07827 to generate *.java files and align.txt files.But generated *.java files' methodNames are not replaced,it just changed the *.java files names. Do you mean i manually replace methodName with a unique id? |
If I understand correctly - yes, I think you need to make sure that the unique IDs are the same. So you can either modify them in the original raw |
Sorry to bother you again. I want to know how could I run code2seq for code documentation. I am trying to test code2seq on my dataset.
The text was updated successfully, but these errors were encountered: