Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NLTK): download NLTK data if NLTK is enabled #39

Merged
merged 1 commit into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions molecule/alternative_installation/converge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@
paperless_ngx_conf_port: 8001
paperless_ngx_system_user_additional_groups:
- root
paperless_ngx_conf_enable_nltk: 1
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
---
- name: Test NTLK
ansible.builtin.include_tasks:
file: test_nltk.yml

- name: Check the activated settings
block:
- name: Read the active settings
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
- name: Check that NLTK installation was successful by executing the document classifier
become: true
become_user: "{{ paperless_ngx_system_user }}"
ansible.builtin.shell: |
{{ paperless_ngx_dir_virtualenv }}/bin/python3 {{ paperless_ngx_dir_installation }}/src/manage.py document_create_classifier
register: _classifier_output
changed_when: false
failed_when: '"Classifier error" in _classifier_output.stderr'
7 changes: 7 additions & 0 deletions tasks/paperless_ngx/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,13 @@
tags: venv
tags: venv

- name: Prepare NLTK environment
ansible.builtin.include_tasks:
file: nltk.yml
apply:
tags: nltk
tags: nltk

- name: Configure paperless-ngx and dependencies
ansible.builtin.include_tasks:
file: configuration.yml
Expand Down
21 changes: 21 additions & 0 deletions tasks/paperless_ngx/nltk.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
- name: Create NLTK dir
become: true
ansible.builtin.file:
path: "{{ paperless_ngx_conf_nltk_dir }}"
state: directory
owner: "{{ paperless_ngx_system_user }}"
group: "{{ paperless_ngx_system_group }}"
mode: "750"

- name: Download NLTK data to NLTK dir
become: true
become_user: "{{ paperless_ngx_system_user }}"
ansible.builtin.shell: |
{{ paperless_ngx_dir_virtualenv }}/bin/python3 -W ignore::RuntimeWarning -m nltk.downloader -d "{{ paperless_ngx_conf_nltk_dir }}" {{ item }}
register: _nltk_download
changed_when: '"already up-to-date" not in _nltk_download.stderr'
loop:
- snowball_data
- stopwords
- punkt