Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding pdftotext for PDF support in Hypercube #77

Merged
merged 3 commits into from
Sep 18, 2019
Merged

Conversation

dannylamb
Copy link
Contributor

GitHub Issue: Islandora/documentation#933

What does this Pull Request do?

Adds pdftotext support for Hypercube.

What's new?

Hypercube now checks the content type of the resource and uses pdftotext if it receives a PDF.

How should this be tested?

  • sudo apt-get install pdftotext
  • Pull this down into crayfish
  • Update the executable entry in config.yaml to be tesseract_executable and pdftotext_executable as per config.example.yaml.
  • Upload a PDF
  • Hit hypercube with something like this: curl -H "Authorization: Bearer islandora" -H "Apix-Ldp-Resource: http://localhost:8080/fcrepo/rest/2019-09/my-file.pdf" localhost:8000/hypercube/
  • Watch the lovely extracted text get blurted out onto your console.

Additional Notes:

Updated ansible role coming soon.

Interested parties

@Islandora-CLAW/committers

@whikloj
Copy link
Member

whikloj commented Sep 9, 2019

@dannylamb you need to update all the tests as the constructor now has additional required arguments.

@dannylamb
Copy link
Contributor Author

Kk, no prawb

@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

Merging #77 into dev will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev      #77      +/-   ##
============================================
+ Coverage     94.59%   94.64%   +0.05%     
- Complexity      159      160       +1     
============================================
  Files             9        9              
  Lines           647      654       +7     
============================================
+ Hits            612      619       +7     
  Misses           35       35
Impacted Files Coverage Δ Complexity Δ
Hypercube/src/Controller/HypercubeController.php 100% <100%> (ø) 5 <1> (+1) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff58f34...27dd6fa. Read the comment docs.

@dannylamb
Copy link
Contributor Author

@whikloj Travis is appeased.

Copy link
Contributor

@seth-shaw-unlv seth-shaw-unlv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@seth-shaw-unlv seth-shaw-unlv merged commit 4173b13 into dev Sep 18, 2019
@dannylamb dannylamb deleted the pdftotext branch January 29, 2020 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants