Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract: 5.3.4 -> 5.4.1 #329842

Closed
wants to merge 1 commit into from
Closed

tesseract: 5.3.4 -> 5.4.1 #329842

wants to merge 1 commit into from

Conversation

21CSM
Copy link
Member

@21CSM 21CSM commented Jul 25, 2024

Description of changes

Bumps tesseract to 5.4.1.

I am also personally invested in this package, so adding myself as a maintainer.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ofborg ofborg bot requested a review from schuelermine July 25, 2024 08:01
@ofborg ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 labels Jul 25, 2024

src = fetchFromGitHub {
owner = "tesseract-ocr";
repo = "tesseract";
rev = version;
sha256 = "sha256-IKxzDhSM+BPsKyQP3mADAkpRSGHs4OmdFIA+Txt084M=";
sha256 = "sha256-Yce9DVt1RJZkwN7ZlUE57eHm+cB9z7MbdFv8uCiGapo=";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sha256 = "sha256-Yce9DVt1RJZkwN7ZlUE57eHm+cB9z7MbdFv8uCiGapo=";
hash = "sha256-Yce9DVt1RJZkwN7ZlUE57eHm+cB9z7MbdFv8uCiGapo=";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Addressed.

@kirillrdy
Copy link
Member

please consider adding link to Changelog and/or Diff to PR and commit message something like 3e051ce

@21CSM
Copy link
Member Author

21CSM commented Jul 25, 2024

please consider adding link to Changelog and/or Diff to PR and commit message something like 3e051ce

Thanks for the feedback, I have addressed this. Let me know if it looks how you expect.

Copy link
Member

@Scrumplex Scrumplex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like OpenCL support was dropped in tesseract-ocr/tesseract#4220. Removing opencl-headers from buildInputs still builds for me.

Edit: Building with and without opencl-headers seems to produce identical binaries, except for the different out path of course.

@21CSM
Copy link
Member Author

21CSM commented Jul 25, 2024

It seems like OpenCL support was dropped in tesseract-ocr/tesseract#4220. Removing opencl-headers from buildInputs still builds for me.

Edit: Building with and without opencl-headers seems to produce identical binaries, except for the different out path of course.

Thanks - it built for me and was able to test and works fine without opencl-headers. Addressed, and fixed some formatting.

@Scrumplex Scrumplex self-requested a review July 25, 2024 21:06
@Scrumplex
Copy link
Member

Result of nixpkgs-review pr 329842 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • khoj
  • khoj.dist
10 packages failed to build:
  • k2pdfopt
  • libsForQt5.pix (plasma5Packages.pix)
  • python311Packages.layoutparser
  • python311Packages.layoutparser.dist
  • python311Packages.pdf2docx
  • python311Packages.pdf2docx.dist
  • python312Packages.layoutparser
  • python312Packages.layoutparser.dist
  • python312Packages.pdf2docx
  • python312Packages.pdf2docx.dist
73 packages built:
  • almanah
  • arcan
  • arcan-all-wrapped
  • arcan-wrapped
  • arcan.dev
  • arcan.lib
  • arcan.man
  • cat9-wrapped
  • durden-wrapped
  • evolution
  • evolution-ews
  • evolutionWithPlugins
  • gImageReader
  • gnome-frog
  • gscan2pdf
  • gscan2pdf.man
  • invoice2data
  • invoice2data.dist
  • kdePackages.skanpage
  • kdePackages.skanpage.debug
  • kdePackages.skanpage.dev
  • kdePackages.skanpage.devtools
  • libsForQt5.mauikit-imagetools (plasma5Packages.mauikit-imagetools)
  • manga-cli
  • mcomix
  • mcomix.dist
  • obs-studio-plugins.advanced-scene-switcher
  • ocrmypdf (python312Packages.ocrmypdf)
  • ocrmypdf.dist (python312Packages.ocrmypdf.dist)
  • paperless-ngx
  • pdfsandwich
  • perl536Packages.ImageOCRTesseract
  • perl536Packages.ImageOCRTesseract.devdoc
  • perl538Packages.ImageOCRTesseract
  • perl538Packages.ImageOCRTesseract.devdoc
  • pipeworld-wrapped
  • prio-wrapped
  • python311Packages.llama-index
  • python311Packages.llama-index-readers-file
  • python311Packages.llama-index-readers-file.dist
  • python311Packages.llama-index-readers-s3
  • python311Packages.llama-index-readers-s3.dist
  • python311Packages.llama-index.dist
  • python311Packages.ocrmypdf
  • python311Packages.ocrmypdf.dist
  • python311Packages.private-gpt
  • python311Packages.private-gpt.dist
  • python311Packages.pymupdf
  • python311Packages.pymupdf.dist
  • python311Packages.pytesseract
  • python311Packages.pytesseract.dist
  • python311Packages.pytikz-allefeld
  • python311Packages.pytikz-allefeld.dist
  • python311Packages.videocr
  • python311Packages.videocr.dist
  • python312Packages.pymupdf
  • python312Packages.pymupdf.dist
  • python312Packages.pytesseract
  • python312Packages.pytesseract.dist
  • python312Packages.pytikz-allefeld
  • python312Packages.pytikz-allefeld.dist
  • python312Packages.videocr
  • python312Packages.videocr.dist
  • spamassassin
  • spamassassin.devdoc
  • termpdfpy
  • termpdfpy.dist
  • tesseract (tesseract5)
  • textsnatcher
  • tika
  • vimPlugins.openscad-nvim
  • xarcan
  • zathura

@Scrumplex
Copy link
Member

10 packages failed to build:

* k2pdfopt

* libsForQt5.pix (plasma5Packages.pix)

* python311Packages.layoutparser

* python311Packages.layoutparser.dist

* python311Packages.pdf2docx

* python311Packages.pdf2docx.dist

* python312Packages.layoutparser

* python312Packages.layoutparser.dist

* python312Packages.pdf2docx

* python312Packages.pdf2docx.dist
  • libsForQt5.pix: fails on the parent commit
  • python311Packages.pdf2docx: fails on the parent commit
  • python312Packages.pdf2docx: fails on the parent commit

The remaining failures are indirect

@Scrumplex
Copy link
Member

This actually seems to break k2pdfopt as it's trying to build a patched version of this derivation and fails to apply its patch.

@Scrumplex
Copy link
Member

Applying the following diff fixes k2pdfopt:

diff --git a/pkgs/applications/misc/k2pdfopt/default.nix b/pkgs/applications/misc/k2pdfopt/default.nix
index 32a0e31a315b..b0a8e5d17742 100644
--- a/pkgs/applications/misc/k2pdfopt/default.nix
+++ b/pkgs/applications/misc/k2pdfopt/default.nix
@@ -142,6 +142,9 @@ in stdenv.mkDerivation rec {
     };
     tesseract_modded = tesseract.override {
       tesseractBase = tesseract.tesseractBase.overrideAttrs ({ patches ? [], ... }: {
+        pname = "tesseract-k2pdfopt";
+        version = tesseract_patch.src.rev;
+        src = tesseract_patch.src;
         patches = patches ++ [ tesseract_patch ];
         # Additional compilation fixes
         postPatch = ''

CC @bosu @danielfullmer

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Aug 4, 2024
@21CSM 21CSM marked this pull request as draft August 9, 2024 03:11
@21CSM
Copy link
Member Author

21CSM commented Aug 9, 2024

Converted to draft @Scrumplex .
Will take a look at issues presented when I get some free time

@21CSM 21CSM closed this Sep 3, 2024
@21CSM 21CSM deleted the tesseract-bump branch September 3, 2024 03:54
@21CSM 21CSM restored the tesseract-bump branch September 3, 2024 03:54
@21CSM 21CSM deleted the tesseract-bump branch September 6, 2024 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 11.by: package-maintainer This PR was created by the maintainer of the package it changes 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants