To install the unreleased unihan-etl version, see developmental releases.
pip:
$ pip install --user --upgrade --pre unihan-etl
pipx:
$ pipx install --suffix=@next unihan-etl --pip-args '\--pre' --force
// Usage: unihan-etl@next
Maintenance release: No bug fixes or new features.
uv is the new package and project manager for the project, replacing Poetry.
Build system moved from poetry to hatchling.
These changes align with Unicode Technical Report #38's 37th revision and are part of ongoing improvements to Unihan data handling.
Adds support for the kFanqie
and kZhuang
fields.
See also:
Unihan_IRGSources
: Updated kRSUnicode
for apostrophes.
See also:
kFrequency
: Removed fromUnihan_DictionaryLikeData
,constants
, anddatapackage.json
.
See also:
- https://www.unicode.org/L2/L2024/24006.htm#178-C17
- https://www.unicode.org/reports/tr38/#ChronologicalListing
- Added tests for simplified expansions to ensure correctness of
kFanqie
andkZhuang
.
- Automatically linkify links that were previously only text.
-
poetry: 1.8.1 -> 1.8.2
See also: https://github.com/python-poetry/poetry/blob/1.8.2/CHANGELOG.md
-
Code quality: Use f-strings in more places (#320)
via ruff 0.4.2.
-
Aggressive automated lint fixes via
ruff
(#317)via ruff v0.3.4, all automated lint fixes, including unsafe and previews were applied:
ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; ruff format .
Branches were treated with:
git rebase \ --strategy-option=theirs \ --exec 'poetry run ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; poetry run ruff format .; git add src tests; git commit --amend --no-edit' \ origin/master
-
poetry: 1.7.1 -> 1.8.1
See also: https://github.com/python-poetry/poetry/blob/1.8.1/CHANGELOG.md
-
ruff 0.2.2 -> 0.3.0 (#316)
Related formattings. Update CI to use
ruff check .
instead ofruff .
.See also: https://github.com/astral-sh/ruff/blob/v0.3.0/CHANGELOG.md
Maintenance release: No bug fixes or new features.
- README: Rewrite introduction, note updated UNIHAN compatibility information.
- Link to UNIHAN release in v0.31.0's changelog notes.
Maintenance release: No bug fixes or new features.
CsvLexer
: Fix quoted items (#314)
-
Strengthen linting (#313)
-
Add flake8-commas (COM)
-
Add flake8-builtins (A)
-
Add flake8-errmsg (EM)
-
- Highlighting for CSV and TSV examples (#253)
- Typing fixes and additional doctest for
kTGH2013
(#312)
- Added
types-pygments
package (#253) - Added some manual type stubs for
pygments
'Lexer
(#253) - pytest-watcher: Silent
*.py.*py
reruns (#312)
Bump UNIHAN compatibility from 11.0.0 to 15.1.0 (released 2023-09-01, revision 35).
- 15.1.0: kHKSCS, kIRGDaiKanwaZiten, kKPS0, kKPS1, kKSC0, kKSC1, kRSKangXi
- 13.0.0: kRSJapanese, kRSKanWa, kRSKorean
- 12.0.0: kDefaultSortKey (private property)
- 15.1.0: kJapanese, kMojiJoho, kSMSZD2003Index, kSMSZD2003Readings, kVietnameseNumeric, kZhuangNumeric
- 15.0.0: kAlternateTotalStrokes
- 14.0.0: kStrange
- 13.0.0: kIRG_SSource, kIRG_UKSource, kSpoofingVariant, kTGHZ2013, kUnihanCore2020
- Quiet pytest tracebacks (#310)
- Relax pytest plugin assertions in regards to zip / export file size (#310)
- Expansions: Fix loading of double apostrophe values via
kRSUnicode
viakRSGeneric
(#304)
- Move CodeQL from advanced configuration file to GitHub's default
- Typo fixes
Maintenance only, no bug fixes, or new features
- ci: Add pydocstyle rule to ruff (#303)
- Add docstrings to functions, methods, classes, and packages (#303)
Maintenance only, no bug fixes, or new features
-
Move pytest configuration to
pyproject.toml
(#299) -
Add Python 3.12 to trove classifiers
-
Per Poetry's docs on managing dependencies and
poetry check
, we had it wrong: Instead of using extras, we should create these:[tool.poetry.group.group-name.dependencies] dev-dependency = "1.0.0"
Which we now do.
-
Poetry: 1.6.1 -> 1.7.0
See also: https://github.com/python-poetry/poetry/blob/1.7.0/CHANGELOG.md
-
Move formatting from
black
toruff format
(#302)This retains the same formatting style of
black
while eliminating a dev dependency by using our existing rust-basedruff
linter. -
CI: Update action packages to fix warnings
- dorny/paths-filter: 2.7.0 -> 2.11.1
SPACE_DELIMITED_LIST_FIELDS
: Fix for field namekAccountingNumeric
found during automated sweep for typos.
-
Typo fixes
typos --format brief --write-changes
One of these typos was for
kAccountingNumeric
inSPACE_DELIMITED_LIST_FIELDS
. -
ruff: Remove ERA /
eradicate
pluginThis rule had too many false positives to trust. Other ruff rules have been beneficial.
-
All pytest plugin fixtures are now prefixed
unihan_
, e.g.:quick_unihan_path
->unihan_quick_path
quick_unihan_options
->unihan_quick_options
quick_unihan_packager
->unihan_quick_packager
ensure_quick_unihan
->unihan_ensure_quick
mock_zip
->unihan_mock_zip
columns
->unihan_quick_columns
-
TestPackager
fixture has been removedThis fixture was made redundant by
unihan_quick_*
andunihan_full_*
fixtures
- pytest plugin (
unihan_zshrc
): Fixskipif
condition to run if shell useszsh(1)
-
"quick" fixtures:
- Data has been moved from
tests/fixtures
tosrc/unihan_etl/data_files/quick
- Fixtures prefixed by
sample_
in the name have been renamed toquick_
- Data has been moved from
-
"quick" and "full" fixtures: Fixed ability to access data files from outside
unihan_etl
package
- ruff: Code quality tweaks (#295)
-
pytest plugin: Add cached fixtures for
UNIHAN
(#291)After initial download of UNIHAN.zip, an 11 second testrun on unihan-etl's test can go down to 1.5 seconds - eliminating redownloading and extraction.
-
pytest plugin: Revert fix of
zshrc
fixture'sskipif
condition (#293)It was fine as-is.
Rolled back
- pytest plugin: Fix
zshrc
fixture'sskipif
condition (#292)
Maintenance only, no bug fixes, or new features
- ruff: Add additional linters, apply code fixes automatically and by hand (#290)
- Typings: Extract
LogLevel
andUnihanFormats
Maintenance only, no bug fixes, or new features
-
zhon: 1.1.5 -> 2.0.0 (#289, fixes #282)
Fixes pytest warning related to regular expressions.
Maintenance only, no bug fixes, or new features
-
{mod}
unihan_etl._internal.app_dirs
improvements (#287)-
Breaking:
app_dirs
moved- Before 0.23.x:
unihan_etl.app_dirs
- After 0.23.x:
unihan_etl._internal.app_dirs
- Before 0.23.x:
-
New feature: Override directories on a one-off basis
-
New feature: Template replacement of variables replacing environmental variables via {func}
os.path.expandvars
+ {func}os.path.expanduser
-
{mod}
doctests
: See the above in action thanks to doctests -
Dedicated tests via pytest
-
- API docs (#288):
- Limit depth of table of contents to one
- Fix section heading
- Fix comment in
AppDirs
- Fix for
destination
of files not replacing file extension correctly (#285)
This module has been renamed.
Before 0.22.x, unihan_etl's configuration was done through a {class}dict
object.
0.22.0 and after settings are configurable via a {obj}dataclasses.dataclass
object:
{class}unihan_etl.options.Options
-
Add {mod}
doctest
support (#274)- Initial doctest example added to README.md, test.py, and util.py.
-
Stub out initial pytest plugin (#274)
-
Split API docs into multiple files (#283)
-
Fix
make start
indocs/Makefile
by fixing argument positions (#283)
- Fix for
destination
of files not replacing file extension correctly (#286)
Maintenance only, no bug fixes or features
- Move file locations to {mod}
pathlib
internally (#277) - Improved typing download
urlretrive_fn
andreporthook
via {class}typing.Protocol
(#277)
Maintenance only, no bug fixes or features
-
Python 3.7 Dropped
Python 3.7 support has been dropped (#272)
Its end-of-life is June 27th, 2023 and Python 3.8 will add support for {mod}
typing
's {class}typing.TypedDict
and {class}typing.Protocol
out of the box without needing {mod}typing_extensions
's.
-
Typings:
- Import {mod}
typing
as a namespace, e.g.import typing as t
(#276) - Use
typing
for {class}typing.TypedDict
and {class}typing.Literal
(#276) - Use typing_extensions' {py:data}
TypeAlias
for repeated types, such in test_expansions (#276)
- Import {mod}
Maintenance only, no bug fixes or features
-
Add back
black
for formattingThis is still necessary to accompany
ruff
, until it replaces black.
Maintenance only, no bug fixes or features
-
Move formatting, import sorting, and linting to ruff.
This rust-based checker has dramatically improved performance. Linting and formatting can be done almost instantly.
This change replaces black, isort, flake8 and flake8 plugins.
-
poetry: 1.4.0 -> 1.5.0
See also: https://github.com/python-poetry/poetry/releases/tag/1.5.0
-
pytest: Fix invalid escape sequence warning from
zhon
merge_dict
: Improve typing of generic params (#271)
- Add PyYAML dependency
-
CI speedups (#267)
- Split out release to separate job so the PyPI Upload docker image isn't pulled on normal runs
- Clean up CodeQL
-
Bump poetry 1.1.x to 1.2.x
- Move
.coveragerc
->pyproject.toml
(#268)
- Move to
src/
-layout structure (#266) - Add flake8-bugbear (#263)
- Add flake8-comprehensions (#264)
- Render changelog in
linkify_issues
(#261, #265) - Fix Table of contents rendering with sphinx autodoc with
sphinx_toctree_autodoc_fix
(#265) - Test doctests in our docs via
pytest_doctest_docutils
(built ondoctest_docutils
) (#265)
- Add vendorized, updated fork of
sphinxcontrib-issuetracker
, via #261. - Remove sphinx-issues package
Follow ups to #257.
merged_dict()
: Fix merging edgecase where destination key was missingdownload()
: Fix edgecase when "downloading" file from local path
- mypy
--strict
annotations, via #257
-
New option:
--no-cache
Disregard cached .zip / extracted files, via #259.
-
Add python 3.8 and 3.9 to CI
This is to make way for strict type annotations, as the typings and generic behavior vary dramatically between 3.7 - 3.11.
- Python 2 compatibility module and imports removed. Python 2.x was officially dropped in 0.12.0 (2021-06-15) via #258
load_data
: Accept list ofpathlib.Path
in addition to list ofstr
- Add Python 3.10 (#248)
- Dropped Python 3.6 (#248)
Infrastructure updates for static type checking and doctest examples.
-
Update poetry to 1.1
- CI: Use poetry 1.1.12 and
install-poetry.py
installer (#237 + #248) - Relock poetry.lock at 1.1 (w/ 1.1.7's fix)
- CI: Use poetry 1.1.12 and
-
Run pyupgrade for python 3.7
-
Tests: Move from
tmpdir
->tmp_path
-
Initial doctests support added, via #255
-
Initial mypy validation, via #255
-
CI (tests, docs): Improve caching of python dependencies via
action/setup-python
's v3/4's new poetry caching, via #255 -
CI (docs): Skip if no
PUBLISH
condition triggered, via #255
- Move to
furo
theme - Add :ref:
quickstart
page - Link to cihai's developer documentation: https://cihai.git-pull.com/contributing/
- #236: Convert to markdown
- Update
black
to 21.6b0 - Update trove classifiers to 3.9
- #235: Drop python 2.7, 3.5. Remove python 2 modesets and
__future__
- #230 Move packaging / publishing to poetry
- #229 Self host docs
- #229 Add metadata / icons / etc. for doc site
- #229 Move travis -> github actions
- #229 Overhaul Makefiles
- Update CHANGES headings to produce working links
- Relax
appdirs
version constraint - #228 Move from Pipfile to poetry
- Fix flicker in download progress bar
- Add
project_urls
to setup.py - Use plain reStructuredText for CHANGES
- Use
collections
that's compatible with python 2 and 3 - PEP8 tweaks
- Add code links in API
- Add
__version__
tounihan_etl
-
#91 New fields from UNIHAN Revision 25.
- kJinmeiyoKanji
- kJoyoKanji
- kKoreanEducationHanja
- kKoreanName
- kTGH
UNIHAN Revision 25 was released 2018-05-18 and issued for Unicode 11.0:
-
Add tests and example corpus for kCCCII
-
Add configuration / make tests for isort, flake8
-
Switch tmuxp config to use pipenv
-
Add Pipfile
-
Add
make sync_pipfile
task to sync requirements/.txt* files with *Pipfile* -
Update and sync Pipfile
-
Developer package updates (linting / docs / testing)
- isort 4.2.15 to 4.3.4
- flake8 3.3.0 to 3.5.0
- vulture 0.14 to 0.27
- sphinx 1.6.2 to 1.7.6
- alagitpull 0.0.12 to 0.0.21
- releases 1.3.1 to 1.6.1
- sphinx-argparse 0.2.1 to 1.6.2
- pytest 3.1.2 to 3.6.4
-
Move documentation over to numpy-style
-
Add sphinxcontrib-napoleon 0.6.1
-
Update LICENSE New BSD to MIT
-
All future commits and contributions are licensed to the cihai software foundation. This includes commits by Tony Narlock (creator).
- Enhance support for locations on kHDZRadBreak fields.
- Fix kIRG_GSource without location
- Fix kFenn output
- Fix kHanyuPinlu support output for n diacritics
- Add expansion for kIRGKangXi
- Normalize Radical-Stroke expansion for kRSUnicode
- Migrate more fields to regular expressions
- Normalize character field for kDaeJaweon, kHanyuPinyin, and kCheungBauer, kFennIndex, kCheungBauerIndex, kIICore, kIRGHanyuDaZidian
- Support for expanding kGSR
- Convert some field expansions to use regexes
- Fix bug where destination file was made into directory on first run
- Rename from unihan-tabular to unihan-etl
- Support for expanding multi-value fields
- Support for pruning empty fields
- Improve help dialog
- Added a page about UNIHAN and the project to documentation
- Split constant values into their own module
- Split functionality for expanding unstructured values into its own module
- Update to add kJa and adjust source file of kCompatibilityVariant per Unicode 8.0.0.
- Support for configuring logging via options and CLI
- Convert all print statements to use logger
- Allow for local / file system sources for Unihan.zip
- Only extract zip if unextracted
- Update package classifiers
- Add back datapackage
- Fix python 2 CSV output
- Default to CSV output
- Move unicodecsv module to dependency package
- Support for XDG directory specification
- Support for custom destination output, including replacing
template variable
{ext}
- Move about.py to module level
- Fix python package import
- Fix readme bug on pypi
- Support for exporting in YAML and JSON
- More internal factoring and simplification
- Return data as list
- Drop python 3.3 an 3.4 support
- Rename from cihaidata_unihan unihan_tabular
- Drop datapackages in favor of a universal JSON, YAML and CSV export.
- Only use UnicodeWriter in Python 2, fixes issue with python
would encode
b
in front of values
- Rename scripts/ to cihaidata_unihan/
- Enable invoking tool via
$ cihaidata_unihan
- Major internal refactor and simplification
- Convert to pytest
assert
statements - Convert full test suite to pytest functions and fixtures
- Get CLI documentation up again
- Improve test coverage
- Lint code, remove unused imports
- Switch license BSD -> MIT
- Rebooted
- Modernize Makefile in docs
- Add Makefile to main project
- Modernize package metadata to use about.py
- Update requirements to use requirements/ folder for base, testing and doc dependencies.
- Update sphinx theme to alabaster with new logo.
- Update travis to use coverall
- Update links on README to use https
- Update travis to test up to python 3.6
- Add support for pypy (why not)
- Lock base dependencies
- Add dev dependencies for isort, vulture and flake8