Can you tell how you pre-processed your Java dataset from raw ? #8

hungkien05 · 2023-04-13T16:50:04Z

Hi,

I am trying to run ADAMO model (https://arxiv.org/pdf/2201.05222.pdf) with my own datasets and the authors use your datasets and your preprocessing. However it seems like ADAMO only needs your *.token.code and *.token.nl files.

I tried to pre-process my dataset in your way but I get some confusion. You mentioned in #2 that you use tokenizer from NeuralCodeSum for tokenizer, however when I use the tokenizer I don't see the results's structure similar to your processed dataset in the Google Drive link you provided.
Actually I don't understand what pre-processing technique you used to get the dataset in the Google Drive.

Can you guide me how pre-processed Java-code dataset from raw to get the final *.token.code files?

Thank you a lot !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you tell how you pre-processed your Java dataset from raw ? #8

Can you tell how you pre-processed your Java dataset from raw ? #8

hungkien05 commented Apr 13, 2023

Can you tell how you pre-processed your Java dataset from raw ? #8

Can you tell how you pre-processed your Java dataset from raw ? #8

Comments

hungkien05 commented Apr 13, 2023