Skip to content

undertheseanlp/chunking

Repository files navigation

Underthesea Chunking

This repository contains experiments in Vietnamese Chunking problems. It is a part of underthesea project.

Corpus Summary

Sentences    : 7855
Unique words : 14245
Top words    : ,, ., ", của, là, và, có, một, người, được, không, đã, những, cho, :, ..., ở, trong, với, đến
POS Tags (28): A, Ab, C, CH, Cb, Cc, E, Eb, I, L, M, Mb, N, Nb, Nc, Np, Nu, Ny, P, Pb, R, T, V, Vb, Vy, X, Y, Z
Chunking Tags (21): B-AP, B-MP, B-NP, B-PP, B-QP, B-TP, B-VP, B-WH, B-WP, B-XP, I-AP, I-MP, I-NP, I-PP, I-QP, I-VP, I-WH , I-WP, I-XP, N-NP, O

Usage

Setup Environment

# clone project
$ git clone git@github.com:magizbox/underthesea.chunking.git

# create environment
$ cd underthesea.chunking
$ conda create -n uts.chunking python=3.4
$ pip install -r requirement.txt

Run Experiments

$ cd underthesea.chunking
$ source activate uts.chunking
$ python main.py

Related Works

Last update: October 2017