This repository introduces the dataset named Brazilian Identity Document Dataset (BID Dataset): The first public dataset of Brazilian identification documents.
BID Dataset was presented in work: "BID Dataset: a challenge dataset for document processing tasks", and aims at three crucial challenges in the Computer Vision field: (i) Document Images Classification; (ii) Text Region Segmentation and (iii) Optical Character Recognition (OCR). BID Dataset is composed of images of Brazilian identification documents divided into eight classes: front and back faces of National Driver's License (CNH), CNH front face, CNH back face, Natural Persons Register (CPF) front face, CPF back face, General Registration (RG) front face, RG back face, and RG front and back faces.
BID Dataset is composed of 28,800 document images, with 3,600 samples for each class.
Sample Dataset: https://drive.google.com/file/d/144EqqmMtCziua9iYo-3afUEvZrJVxUXU/view?usp=sharing
Full Dataset: https://drive.google.com/file/d/1Oi88TRcpdjZmJ79WDLb9qFlBNG8q2De6/view?usp=sharing
@inproceedings{sibgrapi_estendido,
author = {Álysson Soares and Ricardo das Neves Junior and Byron Bezerra},
title = {BID Dataset: a challenge dataset for document processing tasks},
booktitle = {Anais Estendidos do XXXIII Conference on Graphics, Patterns and Images},
location = {Evento Online},
year = {2020},
keywords = {},
issn = {0000-0000},
pages = {143--146},
publisher = {SBC},
address = {Porto Alegre, RS, Brasil},
doi = {10.5753/sibgrapi.est.2020.12997},
url = {https://sol.sbc.org.br/index.php/sibgrapi_estendido/article/view/12997}
}
If you are interested in further datasets related to Brazilian identification documents and occlusion classification, you can also explore the SpotBID Set Dataset. This dataset focuses on scenarios where document images are affected by the spotlight effect, making it an excellent resource for studying occlusion classification in identification documents.
SpotBID Set Dataset: Explore the dataset