Summary

This Google Cloud solution uses Document AI Custom Document Classifier, OCR Processor, Form Parser, and the Google PaLM API to extract information from corporate minute books: details about the corporate entity, directors and officers, and clauses from the shareholder's agreement like quorum rules, restrictions or provisions, and share classes.

Solution Overview

Splits each page from a multi-page PDF into individual pages and saves PDFs to Cloud Storage
Classifies pages using a Custom Document Classifier trained to distinguish types dense-ocr, form-parser, certificate, or other
Parallelizes text extraction with Cloud Function instances that invoke Document AI Processors based on the page type
Augments OCR text with output returned from the Document AI Form Parser processor for form-parser pages
Steps through each page of OCR text to collect relevant entities into the extraction schema using heuristics and LLM prompts
Writes structured JSON output with entitities of interest to Cloud Storage

Requirements

Google Cloud project with a Cloud Storage bucket, Document AI OCR Processor, Form Parser, and Custom Document Classifier
Terraform v1.4.5 to deploy Cloud Functions, Pub/Sub queues
- Update terraform/modules/base/outputs.tf with your own instance IDs

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
terraform		terraform
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Solution Overview

Requirements

About

Releases

Packages

Languages

License

drewgillson/googlepalm-minute-book-extraction

Folders and files

Latest commit

History

Repository files navigation

Summary

Solution Overview

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages