Perseo is a vision transformer based OCR for the Spanish language.
The architecture is based on TrOCR. It is trained on the Spanish Wikipedia dataset, using trdg to generate the images of the sentences. The model's encoder is initialized with the small version of the encoder described in the TrOCR paper, while the decoder in initialized with the RoBERTa Spanish model available in Hugging Face.
Version 0.0 is trained using machine typed characters to evaluate its performance. In future versions handwritten characters will be used.