Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elementary discourse unit segmentation #225

Closed
bact opened this issue May 19, 2019 · 5 comments
Closed

Elementary discourse unit segmentation #225

bact opened this issue May 19, 2019 · 5 comments
Labels
corpus corpus/dataset-related issues enhancement enhance functionalities help wanted no contributor yet
Milestone

Comments

@bact
Copy link
Member

bact commented May 19, 2019

Elementary discourse unit (EDU) is a linguistic unit that larger than a word and smaller than sentence. It contains one piece of information. EDUs and their relationships are basis for constructing discourse structure.

Listed here are papers that discuss how to do Thai EDU segmentation computationally:

We may start this by compiling a list of Thai discourse markers.

Related to #73 (Sentence tokenizer for Thai)

@bact bact added the enhancement enhance functionalities label May 19, 2019
@wannaphong wannaphong added the help wanted no contributor yet label Sep 7, 2019
@bact bact added this to the Future milestone Oct 7, 2019
@wannaphong wannaphong added the corpus corpus/dataset-related issues label Oct 11, 2019
@cstorm125
Copy link
Member

We should compile these markers. It'd be useful for #337 also.

@bact
Copy link
Member Author

bact commented Dec 20, 2019

We should compile these markers. It'd be useful for #337 also.

Agree. I have add few more words to STARTERS and ENDERS. But I think that will require us to retrain the model?

@wannaphong
Copy link
Member

Now, PyThaiNLP has a clause segmentation from LST20 corpus. I thinks we should close this issue.

@bact
Copy link
Member Author

bact commented Jan 31, 2021

Discourse unit is not necessarily equal to clause.
But we can close this for now.

@wannaphong
Copy link
Member

Discourse unit is not necessarily equal to clause.
But we can close this for now.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
corpus corpus/dataset-related issues enhancement enhance functionalities help wanted no contributor yet
Projects
None yet
Development

No branches or pull requests

3 participants