Data extraction with ML
Sparrow is an innovative open-source solution designed for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services such as OCR, Donut fine-tuning/inference, and a data labeling UI, all optimized for robust performance.
- sparrow-data - This service focuses on data preparation specifically for the Donut ML model, including fine-tuning and OCR integration.
- sparrow-ml - Dedicated to the Donut ML model, this service handles both fine-tuning and inference, streamlining the machine learning workflow.
- sparrow-ui - A user-friendly interface for managing Donut ML model data labeling services and a dashboard.
Follow the install steps outlined here:
-
Donut Data install steps
-
Donut ML install steps
-
Donut UI install steps
Follow the steps outlined here:
-
Donut Data usage steps
-
Donut ML usage steps
-
Donut UI usage steps
Sparrow UI:
Licensed under the Apache License, Version 2.0. Copyright 2020-2024 Katana ML, Andrej Baranovskij. Copy of the license.