Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve: add Python 3.12 support (#3033) #3047

Merged
merged 15 commits into from
May 19, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/actions/base-cache/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,14 @@ runs:
if: steps.virtualenv-cache-restore.outputs.cache-hit != 'true'
shell: bash
run: |
python${{ inputs.python-version }} -m pip install --upgrade virtualenv
python${{ inputs.python-version }} -m venv .venv
source .venv/bin/activate
[ ! -d "$NLTK_DATA" ] && mkdir "$NLTK_DATA"
if [ "${{ inputs.python-version == '3.12' }}" == "true" ]; then
python -m ensurepip --upgrade
python -m pip install --upgrade setuptools
fi
make install-ci
- name: Save Cache
if: steps.virtualenv-cache-restore.outputs.cache-hit != 'true'
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
setup:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
env:
NLTK_DATA: ${{ github.workspace }}/nltk_data
Expand All @@ -30,7 +30,7 @@ jobs:
check-deps:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -44,7 +44,7 @@ jobs:
check-extras:
strategy:
matrix:
python-version: [ "3.9","3.10","3.11" ]
python-version: [ "3.9","3.10","3.11","3.12" ]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -98,7 +98,7 @@ jobs:
test_unit:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
env:
NLTK_DATA: ${{ github.workspace }}/nltk_data
Expand Down Expand Up @@ -161,7 +161,7 @@ jobs:
source .venv/bin/activate
sudo apt-get update
sudo apt-get install -y poppler-utils
make install-pandoc
make install-pandoc install-test
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
sudo apt-get update
sudo apt-get install -y tesseract-ocr tesseract-ocr-kor
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## 0.14.1-dev0

### Enhancements

* **Add support for Python 3.12**. `unstructured` now works with Python 3.12!

### Features

### Fixes

## 0.14.0-dev14

### BREAKING CHANGES
Expand Down
6 changes: 2 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,10 @@ install-test:
# NOTE(yao) - CI seem to always install tesseract to test so it would make sense to also require
# pytesseract installation into the virtual env for testing
python3 -m pip install unstructured.pytesseract -c requirements/deps/constraints.txt
python3 -m pip install argilla -c requirements/deps/constraints.txt
# python3 -m pip install argilla==1.28.0 -c requirements/deps/constraints.txt
# NOTE(robinson) - Installing weaviate-client separately here because the requests
# version conflicts with label_studio_sdk
python3 -m pip install weaviate-client -c requirements/deps/constraints.txt
# TODO (yao): find out if how to constrain argilla properly without causing conflicts
python3 -m pip install argilla

.PHONY: install-dev
install-dev:
Expand Down Expand Up @@ -439,7 +437,7 @@ version-sync:

.PHONY: check-coverage
check-coverage:
coverage report --fail-under=95
coverage report --fail-under=90

## check-deps: check consistency of dependencies
.PHONY: check-deps
Expand Down
10 changes: 7 additions & 3 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,9 @@ mypy-extensions==1.0.0
nltk==3.8.1
# via -r ./base.in
numpy==1.26.4
# via -r ./base.in
# via
# -c ././deps/constraints.txt
# -r ./base.in
packaging==23.2
# via
# -c ././deps/constraints.txt
Expand All @@ -67,7 +69,7 @@ python-magic==0.4.27
# via -r ./base.in
rapidfuzz==3.9.0
# via -r ./base.in
regex==2024.5.10
regex==2024.5.15
# via nltk
requests==2.31.0
# via
Expand Down Expand Up @@ -104,4 +106,6 @@ urllib3==1.26.18
# requests
# unstructured-client
wrapt==1.16.0
# via -r ./base.in
# via
# -c ././deps/constraints.txt
# -r ./base.in
9 changes: 8 additions & 1 deletion requirements/deps/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ wheel>=0.38.1
certifi>=2023.7.22
# From pycocotools in local-inference
pyparsing<3.1.0
scipy<1.11.0
scipy<1.11.4
IPython<8.13
# NOTE(alan) Pinned to avoid error that occurs with 2.4.3:
# AttributeError: 'ResourcePath' object has no attribute 'collection'
Expand Down Expand Up @@ -54,3 +54,10 @@ botocore<1.34.52

# NOTE(jennings): pinned due to later versions not supporting api_key_auth in UnstructuredClient
unstructured-client<=0.18.0

fsspec==2024.5.0

# python 3.12 support
numpy>=1.26.0
wrapt>=1.14.0

14 changes: 8 additions & 6 deletions requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,6 @@ anyio==3.7.1
# -c ././deps/constraints.txt
# httpx
# jupyter-server
appnope==0.1.4
# via
# ipykernel
# ipython
argon2-cffi==23.1.0
# via jupyter-server
argon2-cffi-bindings==21.2.0
Expand All @@ -25,6 +21,7 @@ async-lru==2.0.4
# via jupyterlab
attrs==23.2.0
# via
# -c ./test.txt
# jsonschema
# referencing
babel==2.15.0
Expand Down Expand Up @@ -140,11 +137,14 @@ jsonpointer==2.4
# via jsonschema
jsonschema[format-nongpl]==4.22.0
# via
# -c ./test.txt
# jupyter-events
# jupyterlab-server
# nbformat
jsonschema-specifications==2023.12.1
# via jsonschema
# via
# -c ./test.txt
# jsonschema
jupyter==1.0.0
# via -r ./dev.in
jupyter-client==8.6.1
Expand Down Expand Up @@ -307,6 +307,7 @@ qtpy==2.4.1
# via qtconsole
referencing==0.35.1
# via
# -c ./test.txt
# jsonschema
# jsonschema-specifications
# jupyter-events
Expand All @@ -325,6 +326,7 @@ rfc3986-validator==0.1.1
# jupyter-events
rpds-py==0.18.1
# via
# -c ./test.txt
# jsonschema
# referencing
send2trash==1.8.3
Expand Down Expand Up @@ -400,7 +402,7 @@ urllib3==1.26.18
# -c ./base.txt
# -c ./test.txt
# requests
virtualenv==20.26.1
virtualenv==20.26.2
# via pre-commit
wcwidth==0.2.13
# via prompt-toolkit
Expand Down
1 change: 1 addition & 0 deletions requirements/extra-csv.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# pandas
pandas==2.2.2
Expand Down
5 changes: 3 additions & 2 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ contourpy==1.2.1
# via matplotlib
cssselect==1.2.0
# via premailer
cssutils==2.10.2
cssutils==2.11.0
# via premailer
cycler==0.12.1
# via matplotlib
Expand Down Expand Up @@ -95,6 +95,7 @@ networkx==3.2.1
# via scikit-image
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# contourpy
# imageio
Expand Down Expand Up @@ -182,7 +183,7 @@ scikit-image==0.22.0
# via
# imgaug
# unstructured-paddleocr
scipy==1.10.1
scipy==1.11.3
# via
# -c ././deps/constraints.txt
# imgaug
Expand Down
45 changes: 41 additions & 4 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,14 @@ filelock==3.14.0
# huggingface-hub
# torch
# transformers
# triton
flatbuffers==24.3.25
# via onnxruntime
fonttools==4.51.0
# via matplotlib
fsspec==2024.3.1
fsspec==2024.5.0
# via
# -c ././deps/constraints.txt
# huggingface-hub
# torch
google-api-core[grpc]==2.19.0
Expand Down Expand Up @@ -101,6 +103,7 @@ networkx==3.2.1
# via torch
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# contourpy
# layoutparser
Expand All @@ -113,6 +116,37 @@ numpy==1.26.4
# scipy
# torchvision
# transformers
nvidia-cublas-cu12==12.1.3.1
# via
# nvidia-cudnn-cu12
# nvidia-cusolver-cu12
# torch
nvidia-cuda-cupti-cu12==12.1.105
# via torch
nvidia-cuda-nvrtc-cu12==12.1.105
# via torch
nvidia-cuda-runtime-cu12==12.1.105
# via torch
nvidia-cudnn-cu12==8.9.2.26
# via torch
nvidia-cufft-cu12==11.0.2.54
# via torch
nvidia-curand-cu12==10.3.2.106
# via torch
nvidia-cusolver-cu12==11.4.5.107
# via torch
nvidia-cusparse-cu12==12.1.0.106
# via
# nvidia-cusolver-cu12
# torch
nvidia-nccl-cu12==2.20.5
# via torch
nvidia-nvjitlink-cu12==12.4.127
# via
# nvidia-cusolver-cu12
# nvidia-cusparse-cu12
nvidia-nvtx-cu12==12.1.105
# via torch
omegaconf==2.3.0
# via effdet
onnx==1.16.0
Expand Down Expand Up @@ -222,7 +256,7 @@ rapidfuzz==3.9.0
# via
# -c ./base.txt
# unstructured-inference
regex==2024.5.10
regex==2024.5.15
# via
# -c ./base.txt
# transformers
Expand All @@ -238,7 +272,7 @@ safetensors==0.4.3
# via
# timm
# transformers
scipy==1.10.1
scipy==1.11.3
# via
# -c ././deps/constraints.txt
# layoutparser
Expand All @@ -250,7 +284,7 @@ sympy==1.12
# via
# onnxruntime
# torch
timm==0.9.16
timm==1.0.3
# via effdet
tokenizers==0.19.1
# via transformers
Expand All @@ -274,6 +308,8 @@ tqdm==4.66.4
# transformers
transformers==4.40.2
# via unstructured-inference
triton==2.3.0
# via torch
typing-extensions==4.11.0
# via
# -c ./base.txt
Expand All @@ -296,6 +332,7 @@ urllib3==1.26.18
# requests
wrapt==1.16.0
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# deprecated
zipp==3.18.1
Expand Down
1 change: 1 addition & 0 deletions requirements/extra-xlsx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ networkx==3.2.1
# via -r ./extra-xlsx.in
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# pandas
openpyxl==3.1.2
Expand Down
Loading
Loading