There are many datasets in the format of (kanji image - text) pairs on HugginFace now. Choose whichever you like, for example, the dataset created by enpitsu (the original author of Kanji Generation), epts/kanji-full.
Or, if you would like to create a dataset on your own, check out the following useful links:
- KANJIDIC Project, from which you can find and download
kanjidic2.xml
. This xml file contains the information of 13,108 kanji, such as their id and meanings in multiple languages. - KanjiVG, from which you can view kanji using a online viewer. You can also find and download seperated xml files of all kanji from its release page.
Before getting on to build the dataset, we need to prepare kanjidic2.xml
and a folder that contains seperated kanji svg files (e.g., kanjivg-20230110-main
).
cd data/
pip install requirements.txt
python prepare_svg.py
python build_dataset.py
Modify the hardcoded paths and dataset_id in these two
.py
files to match your case.
These two lines do the following things respectively:
-
Parse the
kanjidic2.xml
file and remove all stroke order numbers from the original svg files. This will also create a hashmap that map the kanji id to the English meanings of the corresponding kanji. The hashmap will be saved in theid_to_text.json
file. We save the svg files without stroke order numbers in a new folder namedkanjivg-20230110-main-wo-num
. -
Build the image-text paired dataset and upload it to HuggingFace.
All the codes for finetuning LDM are in train_text_to_image_lora.py, which is a modified version of a diffusers' example for LoRA training.
Modify the PROJECT_DIR
before running the following line:
sh scripts/run-lora.sh
You will be prompted to log into your Wandb account the very first time you run these .sh
scripts.
With hyperparameters in run-lora.sh unchanged(mainly
--resolution=128
,--train_batch_size=64
,--lora_rank=128
), it will need ~14G GPU memory.Note that by default we disable mixed precision training by setting
--mixed_precision="no"
. This doesn't add much to the memory usage, but avoid a lot of unexpected errors.
Model checkpoints will be save under ./ckpt/
folder with name pytorch_lora_weights.safetensors
and log files will be found under ./wandb/
folders.
To further expedite the image generation, we distill the finetuned LDM with train_lcm_distill_lora_sd.py, which is a modified version of a diffusers' example.
sh scripts/run-lcm-lora.sh
With hyperparameters in run-lcm-lora.sh unchanged(mainly
--resolution=128
,--train_batch_size=64
,--lora_rank=64
), it will need ~18G GPU memory.
Follow the code blocks in test.ipynb to test yout training results.
Follow the code blocks in stream.ipynb to test streaming.