Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wtpsplit to sentence segmentation & paragraph segmentation #804

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Jun 5, 2023

Add sentence segmentation with 'wtpsplit' #803

from pythainlp.tokenize import sent_tokenize, paragraph_tokenize

sent = (
            "(1) บทความนี้ผู้เขียนสังเคราะห์ขึ้นมาจากผลงานวิจัยที่เคยทำมาในอดีต"
            +"  มิได้ทำการศึกษาค้นคว้าใหม่อย่างกว้างขวางแต่อย่างใด"
            +" จึงใคร่ขออภัยในความบกพร่องทั้งปวงมา ณ ที่นี้"
        )

print(sent_tokenize(sent,engine="wtp"))

print(paragraph_tokenize(sent))

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Passed code styles and structures
  • Passed code linting checks and unit test

@wannaphong wannaphong linked an issue Jun 5, 2023 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Jun 5, 2023

Coverage Status

coverage: 91.564% (+34.8%) from 56.777% when pulling 95f4ea7 on add-wtpsplit into 3600236 on dev.

Tokenizes text into paragraph.
@wannaphong wannaphong changed the title Add wtpsplit to sentence segmentation Add wtpsplit to sentence segmentation & paragraph segmentation Jun 5, 2023
@wannaphong wannaphong marked this pull request as ready for review June 5, 2023 20:16
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jun 6, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@wannaphong wannaphong added the enhancement enhance functionalities label Jun 6, 2023
@wannaphong wannaphong added this to the 4.1 milestone Jun 6, 2023
@wannaphong wannaphong self-assigned this Jun 6, 2023
@wannaphong wannaphong merged commit d9887d0 into dev Jun 6, 2023
@wannaphong wannaphong deleted the add-wtpsplit branch July 19, 2023 16:54
@wannaphong wannaphong mentioned this pull request Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sentence segmentation with 'wtpsplit'
2 participants