ekorpkit 【iːkɔːkɪt】 : eKonomic Research Python Toolkit

eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by Hydra.

Warning: This is a work in progress

This project is still under development. The API is subject to change. Until the first stable release, the version number will be 0.x.x. Please use it at your own risk. If you have any questions or suggestions, please feel free to contact me.

Especially, some core configuration interface parts of the package will be carbed out and moved to a separate package. The package will be renamed to hyfi (Hydra Fast Interface). Image generation and visualization will be moved to a separate package. The package will be renamed to ekaros (from Íkaros[Icarus] in Greek mythology).

Key features

Easy Configuration

You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research.
You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files.
With a help of the eKonf class, it is also easy to compose configurations in a jupyter notebook environment.

No Boilerplate

eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.

Workflows

A workflow is a configurable automated process that will run one or more jobs.
You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
You can have multiple workflows, each of which can perform a different set of tasks.

Sharable and Reproducible

With eKorpkit, you can easily share your datasets and models.
Sharing configs along with datasets and models makes every research reproducible.
You can share each unit jobs or an entire workflow.

Pluggable Architecture

eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.

Tutorials

Tutorials for ekorpkit package can be found at https://ekorpkit.entelecheia.ai.

Installation

Install the latest version of ekorpkit:

pip install ekorpkit

To install all extra dependencies,

pip install ekorpkit[all]

The eKorpkit Corpus

The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.

Citation

@software{lee_2022_6497226,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.6497226},
  url          = {https://doi.org/10.5281/zenodo.6497226}
}

@software{lee_2022_ekorpkit,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {GitHub},
  url          = {https://github.com/entelecheia/ekorpkit}
}

Changelog

See the CHANGELOG for more information.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is released under the MIT License.
Each corpus adheres to its own license policy. Please check the license of the corpus before using it!

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.circleci		.circleci
.docker		.docker
.github		.github
.vscode		.vscode
book		book
docs		docs
ekorpkit		ekorpkit
scripts		scripts
tests		tests
.copier-config.yaml		.copier-config.yaml
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ekorpkit 【iːkɔːkɪt】 : eKonomic Research Python Toolkit

Warning: This is a work in progress

Key features

Easy Configuration

No Boilerplate

Workflows

Sharable and Reproducible

Pluggable Architecture

Tutorials

Installation

The eKorpkit Corpus

Citation

Changelog

Contributing

License

About

Releases 29

Packages

Contributors 4

Languages

License

entelecheia/ekorpkit

Folders and files

Latest commit

History

Repository files navigation

ekorpkit 【iːkɔːkɪt】 : eKonomic Research Python Toolkit

Warning: This is a work in progress

Key features

Easy Configuration

No Boilerplate

Workflows

Sharable and Reproducible

Pluggable Architecture

Tutorials

Installation

The eKorpkit Corpus

Citation

Changelog

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 29

Packages 0

Contributors 4

Languages

Packages