-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Textract dependency issue; Wagtail version dependency #22
Comments
@DanAtShenTech Yes, i'm completely okay with removing that restriction. Not sure why it's in there, maybe a conservative move. But it looks like there's no reason for it now. We'd have to update the build matrix as well. I'd be happy to accept a PR. |
I've submitted a PR. Would you be willing to update the install script to install from |
Hi Dan, That does not seem the proper solution. But maybe you could document the issues you have with textract itself, and show how users can install it directly from VCS to solve theses issues, in the README? |
OK Kees. As soon as you post to PyPI, I'll go through the whole process of installing and then provide a PR for an update to the README. |
I wanted to bring to the attention of anyone reading this issue some information that I just discovered. Back in 2016 @deanmalmgren called for someone to take over the Textract repo. He tweeted about this need as recently as April 9, 2019. A review of his commit history shows his last commit to the Textract repo was the summer of 2017. While I've been able to get document extraction capability to work somewhat well using wagtail_textract, it feels pretty brittle. I still haven't gotten OCR to work when uploading a file though, and OCR'ed data is not saved with the PDF - see this issue. Also, I use pipenv and can't yet produce a Pipfile.lock to use in production because of dependency issues related to the repo not being kept up-to-date. I'm not at a point that I could take over maintenance of this repo, but I wanted to particularly point this problem out to @khink in case he is. One dependency that it would be nice to update would be to move from Tesseract 3.x to the latest 4.x. |
I’m working to set up Wagtail Textract. I use pipenv and was getting package mismatch errors due to Textract on PyPI not being updated with the latest repo from https://github.com/deanmalmgren/textract (there was a chardet dependency error). However, @deanmalmgren ’s repo DOES have an updated chardet dependency (3.0.4, the latest at this point), so I was able to get around all but one of the errors by installing directly from the repo:
pip install git+https://github.com/deanmalmgren/textract.git –-upgrade
One remaining error (I’m at the latest Wagtail, 2.4):
Would you be willing to remove the wagtail<2.2 dependency? If not, I could do a little testing for you by forking and removing that dependency and installing from my fork, but my testing wouldn’t be extensive. I would have around a hundred documents that I could run the transcription command on, but none of them would require OCR.
I would be willing to propose a re-write of your installation instructions based on the above (you could likely get rid of having to mention the statements about incompatibility errors).
The text was updated successfully, but these errors were encountered: