Papers in the field of OCR(Continually updated)
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
DOCLLM: A LAYOUT-AWARE GENERATIVE LANGUAGE MODEL FOR MULTIMODAL DOCUMENT UNDERSTANDING
ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection(ADNet)(TMM)
CBNet: A Plug-and-Play Network for Segmentation-based Scene Text Detection(CBNet)
Zoom Text Detector
UNITS: UNSUPERVISED INTERMEDIATE TRAINING STAGE FOR SCENE TEXT DETECTION(ICME2022)
Vision-Language Pre-Training for Boosting Scene Text Detectors(ssl for text det CVPR2022)
Few Could Be Better Than All:Feature Sampling and Grouping for Scene Text Detection(Transformer-based)
Kernel Proposal Network for Arbitrary Shape Text Detection(KPN)
Real-Time Scene Text Detection with Differentiable Binarizationand Adaptive Scale Fusion(DBNet++)
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation(code)
Masked and Permuted Implicit Context Learning for Scene Text Recognition(MM23)
Transferring General Multimodal Pretrained Models to Text Recognition(Pretrained Model)
Multi-Granularity Prediction for Scene Text Recognition(ECCV2022)
Levenshtein OCR(ECCV2022)
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition(iregular text)
Scene Text Recognition with Permuted Autoregressive Sequence Models(ABI-based)
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition(SSL)
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining(SSL for encoder and decoder)
Multimodal Semi-Supervised Learning for Text Recognition(Multi-modal SSL)
SVTR: Scene Text Recognition with a Single Visual Model(Visual ICJAI2022)
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation(Semi-supervised)
Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition(SSL)
Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement(ssl pretrain encoder for text recogniton)
Training Protocol Matters:Towards Accurate Scene Text Recognition via Training Protocol Searching(search training protocal)
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition(Textual reason by GCN)
Visual-Semantic Transformer for Scene Text Recognition(Multi-modal recognition)
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features(Multi-modal recognition)
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance(Transformer based recognizer)
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition(position enhance)
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation(KD)
Towards the Unseen: Iterative Text Recognition by Distilling from Errors(Feedback)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition(Multi-Stage Decoder)
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Text Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting(CPR2023)
Text Spotting Transformers(Transformer detect control points)
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text(PANNet for Text Spotting)
DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting(single point text spotting)
SPTS: Single-Point Text Spotting(single point text spotting)
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks(DOC2GRAPH)
Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator
General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining
LexiconNet: An End-to-End Handwritten Paragraph Text Recognition System
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
When Counting Meets HMER:Counting-Aware Network for Handwritten Mathematical Expression Recognition
Revisiting Table Detection Datasets for Visually Rich Documents