Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the whole data supposed to be in an single file without any \n character? #16

Open
deshwalmahesh opened this issue Feb 2, 2022 · 0 comments

Comments

@deshwalmahesh
Copy link

Yout example_text_file is long and consists no newline character. I can get the idea about MLM as you could split the data in equal length sentences (if this is what you are doing) but what do you do for NSP? How do you verify that a new Paper / Article has started? It means that Sent_N and Sent_N+1 could be from different articles? Could you please shed some light on how MLM and NSP training have been done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant