PDF Parsy

Extract text and images from PDFs through and endpoint that runs as a lambda function built with the serverless framework

Setup application

docker-compose build
docker-compose run --rm app pip install -r requirements_dev.txt
docker-compose run --rm -e S3_SAMPLE_OBJECT_KEY=my_s3_bucket -e S3_SAMPLE_IMAGES_FOLDER=my_s3_folder app python initial_setup.py

Invoke local

docker-compose run --rm -e S3_BUCKET=my_bucket -e S3_ACCESS_KEY_ID=my_access_key_id -e S3_SECRET_ACCESS_KEY=my_aws_secret_access_key app sls invoke local -f pdf_to_text -p fixtures/pdf_input.json

AWS credentials should have permissions to:

Read from the provided S3 bucket that stores the PDF to be analyzed
Write the extracted images from the PDF on the folder set by S3_SAMPLE_IMAGES_FOLDER on the previous step
Run the function:

docker-compose run --rm app sls invoke local -f pdf_to_text -p fixtures/pdf_input.json

Deploy

docker-compose run --rm -e AWS_ACCESS_KEY_ID=my_access_key_id -e AWS_SECRET_ACCESS_KEY=my_aws_secret_access_key app sls deploy --verbose

Set the following env vars on the AWS page for your newly created lamba functions:
S3_BUCKET
S3_ACCESS_KEY_ID
S3_SECRET_ACCESS_KEY

Endpoints

GET /pdf_to_text?s3_pdf_key=key_to_object_on_s3
- Extracts text from PDF and returns in a json response
POST /pdf_images?s3_pdf_key=key_to_object_to_s3&s3_images_folder=s3_folder_to_store_images
- Extracts images from PDF and uploads to the specified s3_images_folder on the bucket configured in the application

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github		.github
fixtures		fixtures
test/unit		test/unit
.gitignore		.gitignore
.pylintrc		.pylintrc
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Dockerfile		Dockerfile
README.md		README.md
aws.py		aws.py
docker-compose.yml		docker-compose.yml
handler.py		handler.py
initial_setup.py		initial_setup.py
package-lock.json		package-lock.json
package.json		package.json
pdf_images_extractor.py		pdf_images_extractor.py
pdf_to_text.py		pdf_to_text.py
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Parsy

Setup application

Invoke local

Deploy

Endpoints

About

Releases

Packages

Languages

nunesmatheus/pdf-parsy

Folders and files

Latest commit

History

Repository files navigation

PDF Parsy

Setup application

Invoke local

Deploy

Endpoints

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages