You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run ADAMO model (https://arxiv.org/pdf/2201.05222.pdf) with my own datasets and the authors use your datasets and your preprocessing. However it seems like ADAMO only needs your *.token.code and *.token.nl files.
I tried to pre-process my dataset in your way but I get some confusion. You mentioned in #2 that you use tokenizer from NeuralCodeSum for tokenizer, however when I use the tokenizer I don't see the results's structure similar to your processed dataset in the Google Drive link you provided.
Actually I don't understand what pre-processing technique you used to get the dataset in the Google Drive.
Can you guide me how pre-processed Java-code dataset from raw to get the final *.token.code files?
Thank you a lot !
The text was updated successfully, but these errors were encountered:
Hi,
I am trying to run ADAMO model (https://arxiv.org/pdf/2201.05222.pdf) with my own datasets and the authors use your datasets and your preprocessing. However it seems like ADAMO only needs your *.token.code and *.token.nl files.
I tried to pre-process my dataset in your way but I get some confusion. You mentioned in #2 that you use tokenizer from NeuralCodeSum for tokenizer, however when I use the tokenizer I don't see the results's structure similar to your processed dataset in the Google Drive link you provided.
Actually I don't understand what pre-processing technique you used to get the dataset in the Google Drive.
Can you guide me how pre-processed Java-code dataset from raw to get the final *.token.code files?
Thank you a lot !
The text was updated successfully, but these errors were encountered: