Wisdom in 4 Characters Or Less

Open in gitpod

Purpose

A project to generate 四字熟語 (yoji-jukugo, 4 character Japanese idioms), using a sequential tensorflow model.

The dataset used for the current project was scraped/pulled from the following:

The main report, compiled with datapane and also in html format
The full yoji_df dataframe describing the idioms, their constituent kanji, and all additional characteristics from the data linked above
List of generated idioms, sans definitions and readings
The same list, expanded out to a dataframe including readings and meanings of constituent characters and bigrams

After sharing the initial project with some coworkers, it was suggested (by @DC & @JZ) that I retrain the model on bigrams within each idiom, as this more closely aligns with how yoji-jukugo are semantically divided and understood. I've updated the report linked above with some additional thoughts on the new model and its results!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
outputs		outputs
.gitignore		.gitignore
.python-version		.python-version
Python.gitignore		Python.gitignore
README.md		README.md
define.py		define.py
gen.py		gen.py
gen_2.py		gen_2.py
header.jpg		header.jpg
prep.py		prep.py
prep_2.py		prep_2.py
pyproject.toml		pyproject.toml
report.md		report.md
report.py		report.py
scrape.py		scrape.py
train.py		train.py
train_2.py		train_2.py
tune.py		tune.py
tune_2.py		tune_2.py
uv.lock		uv.lock
yoji_rprt.html		yoji_rprt.html