Evaluation Dataset for Japanese Lexical Simplification

Notes:

Sentences selected from BCCWJ, so they are not published.
Here, program which extract sentence is published.
This program is made by Python 2.7 .

If you use this, please cite the following paper:

@inproceedings{kodaira-etal-2016-controlled,
    title = "Controlled and Balanced Dataset for {J}apanese Lexical Simplification",
    author = "Kodaira, Tomonori  and
      Kajiwara, Tomoyuki  and
      Komachi, Mamoru",
    booktitle = "Proceedings of the {ACL} 2016 Student Research Workshop",
    year = "2016",
    pages = "1--7",
}

Procedure:

git clone https://github.com/KodairaTomonori/EvaluationDataset
cd Script
python get\sent_from_BCCWJ.py xxxx/BCCWJ/SUW/
python extract_sentence_from_location.py

other Notes:

substitution ranking is in substitutes folder.
subs.csv: target word list
ave_rank.csv and mle_rank.csv: Substitutes in these file is sorted by average score and MLE score.
Cmma is indicated different rank, and space is indicated same rank.

Affiliation:
Tokyo Metropolitan University
System Design - Komachi Lab
Name: Kodaira Tomonori
E-mail: kodaira-tomonori-at-ed.tmu.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
BCCWJ_target_location		BCCWJ_target_location
Script		Script
annotation_data		annotation_data
substitutes		substitutes
LICENCE		LICENCE
README.md		README.md
README_ja.md		README_ja.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation Dataset for Japanese Lexical Simplification

Notes:

Procedure:

other Notes:

About

Releases

Packages

Languages

License

KodairaTomonori/EvaluationDataset

Folders and files

Latest commit

History

Repository files navigation

Evaluation Dataset for Japanese Lexical Simplification

Notes:

Procedure:

other Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages