Skip to content

milely/OCR_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 

Repository files navigation

OCR_paper

Papers in the field of OCR(Continually updated)

Document Analysis with multi-modal large language model

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
DOCLLM: A LAYOUT-AWARE GENERATIVE LANGUAGE MODEL FOR MULTIMODAL DOCUMENT UNDERSTANDING

Text Detection

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection(ADNet)(TMM)
CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection(CBNet)
Zoom Text Detector
UNITS: UNSUPERVISED INTERMEDIATE TRAINING STAGE FOR SCENE TEXT DETECTION(ICME2022)
Vision-Language Pre-Training for Boosting Scene Text Detectors(ssl for text det CVPR2022)
Few Could Be Better Than All:Feature Sampling and Grouping for Scene Text Detection(Transformer-based)
Kernel Proposal Network for Arbitrary Shape Text Detection(KPN)
Real-Time Scene Text Detection with Differentiable Binarizationand Adaptive Scale Fusion(DBNet++)
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation(code)

Text Recognition

Masked and Permuted Implicit Context Learning for Scene Text Recognition(MM23)
Transferring General Multimodal Pretrained Models to Text Recognition(Pretrained Model)
Multi-Granularity Prediction for Scene Text Recognition(ECCV2022)
Levenshtein OCR(ECCV2022)
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition(iregular text)
Scene Text Recognition with Permuted Autoregressive Sequence Models(ABI-based)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition(SSL)
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining(SSL for encoder and decoder)
Multimodal Semi-Supervised Learning for Text Recognition(Multi-modal SSL)
SVTR: Scene Text Recognition with a Single Visual Model(Visual ICJAI2022)
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation(Semi-supervised)
Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition(SSL)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement(ssl pretrain encoder for text recogniton)
Training Protocol Matters:Towards Accurate Scene Text Recognition via Training Protocol Searching(search training protocal)
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition(Textual reason by GCN)
Visual-Semantic Transformer for Scene Text Recognition(Multi-modal recognition)
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features(Multi-modal recognition)
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance(Transformer based recognizer)
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition(position enhance)
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation(KD)
Towards the Unseen: Iterative Text Recognition by Distilling from Errors(Feedback)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition(Multi-Stage Decoder)

End-to-End text recogniton(Text Spotting)

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting(CPR2023)
Text Spotting Transformers(Transformer detect control points)
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text(PANNet for Text Spotting)
DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting(single point text spotting)
SPTS: Single-Point Text Spotting(single point text spotting)

Document layout analysis

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks(DOC2GRAPH)

Font Generation && Style Transfer

Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator

OCR Post Process(spell check)

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Paragraph Recognition

LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System

Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
When Counting Meets HMER:Counting-Aware Network for Handwritten Mathematical Expression Recognition

Table Releated

Revisiting Table Detection Datasets for Visually Rich Documents

About

Papers in the field of OCR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published