Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get original functions of tokenized function codes that are present in data.zip? #6

Open
anks12297 opened this issue Dec 22, 2021 · 3 comments

Comments

@anks12297
Copy link

Hi, thanks for sharing your great work!

I have downloaded your dataset (data.zip file). I believe that train.token.code files for java and python have tokenized codes based on AST. Links for original dataset for java and python are provided on Readme page but how could I map tokenized function code present in train.token.code to the original function?

It could be very helpful if you provide some guidelines to get original function and tokenized function code pairs?
THANK YOU!

@gingasan
Copy link
Owner

gingasan commented Dec 22, 2021 via email

@Li-Lixuan
Copy link

Thank you for your attention on my work. I still save the original corpus of Python (link bellow). But Java is missing by accident. I am sorry for that. https://drive.google.com/file/d/1Zbven9w21UMs3SIg_CtYGgiOdF9BSP9H/view?usp=sharing Best, Hongqiu

----- 原始邮件 ----- 发件人: "anks12297" @.> 收件人: "gingasan/sit3" @.> 抄送: "Subscribed" @.> 发送时间: 星期三, 2021年 12 月 22日 下午 1:59:07 主题: [gingasan/sit3] How to get original functions of tokenized function codes that are present in data.zip? (Issue #6) Hi, thanks for sharing your great work! I have downloaded your dataset (data.zip file). I believe that train.token.code files for java and python have tokenized codes based on AST. Links for original dataset for java and python are provided on Readme page but how could I map tokenized function code present in train.token.code to the original function? It could be very helpful if you provide some guidelines to get original function and tokenized function code pairs? THANK YOU!
-- Reply to this email directly or view it on GitHub: #6 You are receiving this because you are subscribed to this thread. Message ID: @.
>

It seems not the dataset with original split. Could you provide the original corpus with original split? It would be very helpful if you did. Thanks a lot!

@gingasan
Copy link
Owner

gingasan commented Jul 19, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants