The ultimate goal of our research is to build a system that has high-level intelligence, i.e., the abilities to read, think and create, so advanced that it could even surpass human intelligence one day in the future. We name this kind of systems Advanced Literate Machinery (ALM).
This project is maintained by the 读光 OCR Team (读光-Du Guang means “Reading The Light”) in the Language Technology Lab, Alibaba DAMO Academy.
Visit our 读光-Du Guang Portal to experience online demos for OCR and Document Understanding.
2022.9 Release
- MGP-STR (ECCV 2022, paper): Based on ViT and a tailored Adaptive Addressing and Aggregation module, we explore an implicit way for incorporating linguistic knowledge by introducing subword representations to facilitate multi-granularity prediction and fusion in scene text recognition.
- LevOCR (ECCV 2022, paper): Inspired by Levenshtein Transformer, we cast the problem of scene text recognition as an iterative sequence refinement process, which allows for parallel decoding, dynamic length change and good interpretability.