μμ μΈμ(Latex Recognition)μ μμ μ΄λ―Έμ§μμ LaTeX ν¬λ§·μ ν
μ€νΈλ₯Ό μΈμνλ νμ€ν¬λ‘, λ¬Έμ μΈμ(Character Recognition)κ³Ό λ¬λ¦¬ μμ μΈμμ κ²½μ° μ’ β μ°
λΏλ§ μλλΌ Multi-lineμ λν΄μ μ β μλ
μ λν μμ ν¨ν΄ νμ΅λ νμνλ€λ νΉμ§μ κ°μ§λλ€.
ocr_teamcode/
β
βββ config/ # train argument config file
β βββ Attention.yaml
β βββ SATRN.yaml
β
βββ data_tools/ # utils for dataset
β βββ download.sh # dataset download script
β βββ extract_tokens.py # extract tokens from token.txt
β βββ make_dataset.py # sample dataset
β βββ parse_upstage.py # convert JSON ground truth file to ICDAR15 format
β βββ train_test_split.py # split dataset into train and test dataset
β
βββ networks/ # network, loss
β βββ Attention.py
β βββ SATRN.py
β βββ loss.py
β βββ spatial_transformation.py
β
βββ checkpoint.py # save, load checkpoints
βββ pre_processing.py # preprocess images with OpenCV
βββ custom_augment.py # image augmentations
βββ transform.py
βββ dataset.py
βββ flags.py # parse yaml to FLAG format
βββ inference.py # inference
βββ metrics.py # calculate evaluation metrics
βββ scheduler.py # learning rate scheduler
βββ train.py # train
βββ utils.py # utils for training
input/data/train_dataset
β
βββ images/ # input image folder
β βββ train_00000.jpg
β βββ train_00001.jpg
β βββ train_00002.jpg
β βββ ...
|
βββ gt.txt # input data
βββ level.txt # formula difficulty feature
βββ source.txt # printed output / hand written feature
βββ tokens.txt # vocabulary for training
pip install -r requirements.txt
- scikit_image==0.14.1
- opencv_python==3.4.4.19
- tqdm==4.28.1
- torch==1.7.1+cu101
- torchvision==0.8.2+cu101
- scipy==1.2.0
- numpy==1.15.4
- pillow==8.2.0
- tensorboardX==1.5
- editdistance==0.5.3
- python-dotenv==0.17.1
- wandb==0.10.30
- adamp==0.3.0
- python-dotenv==0.17.1
sh filename.sh
π νμ΅λ°μ΄ν°λ Dataset Folderμ κ°μ΄ λ£μ΄μ£ΌμΈμ!
π λ¨μΌ 컬λΌμΌλ‘ ꡬμ±λ txtλ
\n
μ κΈ°μ€μΌλ‘ λ°μ΄ν°λ₯Ό ꡬλΆνλ©°, 2κ° μ΄μμ 컬λΌμΌλ‘ ꡬμ±λ txtλ\t
λ‘ μ»¬λΌμ,\n
μΌλ‘ λ°μ΄ν°λ₯Ό ꡬλΆν©λλ€.
νμ΅λ°μ΄ν°λ tokens.txt
, gt.txt
, level.txt
, source.txt
μ΄ 4κ°μ νμΌκ³Ό μ΄λ―Έμ§ ν΄λλ‘ κ΅¬μ±λμ΄ μμ΅λλ€.
μ΄ μ€ tokens.txt
μ gt.txt
λ λͺ¨λΈ νμ΅μ κΌ νμν μ
λ ₯ νμΌμ΄λ©°, level.txt
, source.txt
λ μ΄λ―Έμ§μ λν λ©ν λ°μ΄ν°λ‘ λ°μ΄ν°μ
λΆλ¦¬μμ μ¬μ©ν©λλ€.
-
tokens.txt
λ νμ΅μ μ¬μ©λλ vocabulary νμΌλ‘μ λͺ¨λΈ νμ΅μ νμν tokenλ€μ΄ μ μλμ΄ μμ΅λλ€.O \prod \downarrow ...
-
gt.txt
λ μ€μ νμ΅μ μ¬μ©νλ νμΌλ‘ μ΄λ―Έμ§ κ²½λ‘, LaTexλ‘ λ Ground Truthλ‘ κ° μ»¬λΌμ΄ ꡬμ±λμ΄ μμ΅λλ€.train_00000.jpg 4 \times 7 = 2 8 train_00001.jpg a ^ { x } > q train_00002.jpg 8 \times 9 ...
-
level.txt
λ μμμ λμ΄λ μ 보 νμΌλ‘ κ° μ»¬λΌμ κ²½λ‘μ λμ΄λλ‘ κ΅¬μ±λμ΄ μμ΅λλ€. κ° μ«μλ 1(μ΄λ±), 2(μ€λ±), 3(κ³ λ±), 4(λν), 5(λν μ΄μ)μ μλ―Έν©λλ€.train_00000.jpg 1 train_00001.jpg 2 train_00002.jpg 2 ...
-
source.txt
λ μ΄λ―Έμ§μ μΆλ ₯ νν μ 보 νμΌλ‘, 컬λΌμ κ²½λ‘μ μμ€λ‘ ꡬμ±λμ΄ μμ΅λλ€. κ° μ«μλ 0(νλ¦°νΈ μΆλ ₯λ¬Ό), 1(μκΈμ¨)λ₯Ό λ»ν©λλ€.train_00000.jpg 1 train_00001.jpg 0 train_00002.jpg 0
wandb loggingμ μ¬μ© μ wandbμ λ겨주μ΄μΌ νλ μΈμλ₯Ό .env
νμΌμ μ μν©λλ€.
PROJECT="[wandb project name]"
ENTITY="[wandb nickname]"
νμ΅ μ μ¬μ©νλ config νμΌμ yaml
νμΌλ‘ νμ΅ λͺ©νμ λ°λΌ λ€μκ³Ό κ°μ΄ μ€μ ν΄μ£ΌμΈμ.
network: SATRN
input_size: # resize image
height: 48
width: 192
SATRN:
encoder:
hidden_dim: 300
filter_dim: 1200
layer_num: 6
head_num: 8
shallower_cnn: True # shallow CNN
adaptive_gate: True # A2DPE
conv_ff: True # locality-aware feedforward
separable_ff: True # only if conv_ff is True
decoder:
src_dim: 300
hidden_dim: 300
filter_dim: 1200
layer_num: 3
head_num: 8
checkpoint: "" # load checkpoint
prefix: "./log/satrn" # log folder name
data:
train: # train dataset file path
- "/opt/ml/input/data/train_dataset/gt.txt"
test: # validation dataset file path
-
token_paths: # token file path
- "/opt/ml/input/data/train_dataset/tokens.txt" # 241 tokens
dataset_proportions: # proportion of data to take from train (not test)
- 1.0
random_split: True # if True, random split from train files
test_proportions: 0.2 # only if random_split is True, create validation set
crop: True # center crop image
rgb: 1 # 3 for color, 1 for greyscale
batch_size: 16
num_workers: 8
num_epochs: 200
print_epochs: 1 # print interval
dropout_rate: 0.1
teacher_forcing_ratio: 0.5 # teacher forcing ratio
teacher_forcing_damp: 5e-3 # teacher forcing decay (0 to turn off)
max_grad_norm: 2.0 # gradient clipping
seed: 1234
optimizer:
optimizer: AdamP
lr: 5e-4
weight_decay: 1e-4
selective_weight_decay: True # no decay in norm and bias
is_cycle: True # cyclic learning rate scheduler
label_smoothing: 0.2 # label smoothing factor (0 to off)
patience: 30 # stop train after waiting (-1 for off)
save_best_only: True # save best model only
fp16: True # mixed precision
wandb:
wandb: True # wandb logging
run_name: "sample_run" # wandb project run name
python train.py [--config_file]
--config_file
: config νμΌ κ²½λ‘
python inference.py [--checkpoint] [--max_sequence] [--batch_size] [--file_path] [--output_dir]
--checkpoint
: checkpoint νμΌ κ²½λ‘--max_sequence
: inference μ μ΅λ μνμ€ κΈΈμ΄--batch_size
: λ°°μΉ ν¬κΈ°--file_path
: test dataset κ²½λ‘--output_dir
: inference κ²°κ³Ό μ μ₯ λλ ν 리
- On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention, Lee et al., 2019
- Bag of Tricks for Image Classiτcation with Convolutional Neural Networks, He et al., 2018
- Averaging Weights Leads to Wider Optima and Better Generalization, Izmailov et al., 2018
- CSTR: Revisiting Classification Perspective on Scene Text Recognition, Cai et al., 2021
- Improvement of End-to-End Offline Handwritten Mathematical Expression Recognition by Weakly Supervised Learning, Truong et al., 2020
- ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators, Clark et al., 2020
- SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition, Qiao et al., 2020
- Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition, Fang et al., 2021
- Googleβs Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Wu et al., 2016
κΉμ’ μ | λ―Όμ§μ | λ°μν | λ°°μλ―Ό | μ€μΈλ―Ό | μ΅μ¬ν |
---|---|---|---|---|---|
Distributed under the MIT License. See LICENSE
for more information.