diff --git a/docs/core-concepts/classification/mom.md b/docs/core-concepts/classification/mom.md index 7a055cf..08c78b3 100644 --- a/docs/core-concepts/classification/mom.md +++ b/docs/core-concepts/classification/mom.md @@ -93,5 +93,3 @@ result = process.classify( - Set appropriate confidence thresholds based on your use case - Consider using different model providers for better diversity - Monitor and log classification results for each model - -For more examples and advanced usage, check out the [examples directory](examples/) in the repository. \ No newline at end of file diff --git a/docs/core-concepts/classification/vision.md b/docs/core-concepts/classification/vision.md index 0fa94b7..3acdd5d 100644 --- a/docs/core-concepts/classification/vision.md +++ b/docs/core-concepts/classification/vision.md @@ -98,5 +98,3 @@ result = process.classify( image=True ) ``` - -For more examples and advanced usage, check out the [examples directory](examples/) in the repository. \ No newline at end of file diff --git a/docs/core-concepts/contracts/index.md b/docs/core-concepts/contracts/index.md index 11edfa6..3bf5c6d 100644 --- a/docs/core-concepts/contracts/index.md +++ b/docs/core-concepts/contracts/index.md @@ -30,5 +30,3 @@ class InvoiceContract(Contract): ```python --8<-- "extract_thinker/models/contract.py" ``` - -For more examples and advanced usage, check out the [examples directory](examples/) in the repository. diff --git a/docs/core-concepts/document-loaders/aws-textract.md b/docs/core-concepts/document-loaders/aws-textract.md index 266fb78..94c947d 100644 --- a/docs/core-concepts/document-loaders/aws-textract.md +++ b/docs/core-concepts/document-loaders/aws-textract.md @@ -68,4 +68,4 @@ The loader returns a dictionary with the following structure: - Process pages individually for large documents - Monitor API quotas and costs -For more examples and implementation details, check out the [examples directory](examples/) in the repository. \ No newline at end of file +For more examples and implementation details, check out the [AWS Stack](../../examples/aws-textract) in the repository. \ No newline at end of file diff --git a/docs/core-concepts/document-loaders/azure-form.md b/docs/core-concepts/document-loaders/azure-form.md index b2082d7..bcbd3c5 100644 --- a/docs/core-concepts/document-loaders/azure-form.md +++ b/docs/core-concepts/document-loaders/azure-form.md @@ -61,4 +61,4 @@ Document Intelligence supports `PDF`, `JPEG/JPG`, `PNG`, `BMP`, `TIFF`, `HEIF`, - Handle tables and paragraphs separately for better accuracy - Process documents page by page for large files -For more examples and advanced usage, check out the [examples directory](examples/) in the repository. \ No newline at end of file +For more examples and implementation details, check out the [Azure Stack](../../examples/azure-form.md) in the repository. \ No newline at end of file diff --git a/docs/core-concepts/document-loaders/google-document-ai.md b/docs/core-concepts/document-loaders/google-document-ai.md index 1c2e5ef..45c4378 100644 --- a/docs/core-concepts/document-loaders/google-document-ai.md +++ b/docs/core-concepts/document-loaders/google-document-ai.md @@ -79,4 +79,4 @@ with open("document.pdf", "rb") as f: Document AI supports PDF, TIFF, GIF, JPEG, PNG with a maximum file size of 20MB or 2000 pages. -For more examples and implementation details, check out the [examples directory](examples/) in the repository. \ No newline at end of file +For more examples and implementation details, check out the [Google Stack](../../examples/google-document-ai) in the repository. \ No newline at end of file diff --git a/docs/core-concepts/document-loaders/tesseract.md b/docs/core-concepts/document-loaders/tesseract.md index 29e5dca..15ac2c9 100644 --- a/docs/core-concepts/document-loaders/tesseract.md +++ b/docs/core-concepts/document-loaders/tesseract.md @@ -61,4 +61,4 @@ Document Intelligence supports `PDF`, `JPEG/JPG`, `PNG`, `BMP`, `TIFF` - Consider image preprocessing for better accuracy - Set appropriate page segmentation mode based on document layout -For more examples and advanced usage, check out the [examples directory](examples/) in the repository. +For more examples and advanced usage, check out the [Local Stack](../../examples/local-processing) in the repository. diff --git a/docs/examples/local-processing.md b/docs/examples/local-processing.md index 3b36817..b0101ce 100644 --- a/docs/examples/local-processing.md +++ b/docs/examples/local-processing.md @@ -96,16 +96,4 @@ result = extractor.extract("document.pdf", Contract) ) except Exception as e: print(f"Processing error: {e}") - ``` - -## Performance Comparison - -| Aspect | Cloud | Local | -|--------|-------|-------| -| Speed | Faster | Depends on hardware | -| Cost | Pay per use | Free | -| Privacy | Data leaves network | Complete privacy | -| Setup | Simple | More complex | -| Maintenance | None | Required | - -For more examples and implementation details, check out the [examples directory](https://github.com/enoch3712/ExtractThinker/tree/main/examples) in the repository. \ No newline at end of file + ``` \ No newline at end of file diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 9c10e85..28ac950 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -67,24 +67,22 @@ print(f"Total: ${result.total_amount}")
Extraction with Pydantic
Extract structured data from any document type using Pydantic models for validation, custom features, and prompt engineering capabilities.
- +Classification & Split
Intelligent document classification and splitting with support for consensus strategies, eager/lazy splitting, and confidence thresholds.
- +PII Detection
Automatically detect and handle sensitive personal information in documents with privacy-first approach and advanced validation.
- +LLM and OCR Agnostic
Freedom to choose and switch between different LLM providers and OCR engines based on your needs and cost requirements.
- +