Download nltk's punkt_tab in align_score Dockerfile #841

yonromai · 2024-11-05T01:24:29Z

Description

Fixes an issue where alignscore-server (defined in nemoguardrails/library/factchecking/align_score/Dockerfile) throws a runtime LookupError(resource_not_found) error. This error seems to be due to a breaking change in nltk. Please see steps below to reproduce.

cc: @drazvan

Steps to reproduce the error

Build alignscore-server Docker Image:

# from the root of NeMo-Guardrails
cd nemoguardrails/library/factchecking/align_score
docker build -t alignscore-server .

Run alignscore-server:

docker run -p 5123:5000 alignscore-server
# ...
# INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

Send a request to alignscore-server (executed from a local notebook):

from nemoguardrails.library.factchecking.align_score.request import alignscore_request

await alignscore_request(
	api_url="http://localhost:5123/alignscore_base",
	evidence="Hello, world!",
	response="Hello, world!",
)
# Output: "AlignScore API request failed with status 500"

Note: I get a similar behavior when LLMRails invokes the alignscore_check_facts action; which is how I encountered this issue in the first place.

Logs from alignscore-server:

INFO:     172.17.0.1:60146 - "POST /alignscore_base HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  [...]
  File "/usr/local/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
  File "/usr/local/lib/python3.10/site-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/root/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

After this commit

Re-build and re-run container:

docker build -t alignscore-server .
docker run -p 5123:5000 alignscore-server

Request:

await alignscore_request(
	api_url="http://localhost:5123/alignscore_base",
	evidence="Hello, world!",
	response="Hello, world!",
)
# Output: 0.9991656541824341

Logs from alignscore-server:

INFO:     172.17.0.1:61214 - "POST /alignscore_base HTTP/1.1" 200 OK

Related Issue(s)

nltk/nltk#3293

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

Pouyanpi

Thank you @yonromai for catching this 👍🏻 LGTM!

Pouyanpi · 2024-11-06T18:04:34Z

reference to the issue nltk/nltk#3293

Pouyanpi · 2024-11-06T18:07:18Z

@yonromai would you please gpg sign your commits per CONTRIBUTING.md?

yonromai · 2024-11-06T19:43:18Z

@yonromai would you please gpg sign your commits per CONTRIBUTING.md?

Thank you for taking a look @Pouyanpi

I thought I signed the previous commit; wondering if something went wrong when rebasing/force-pushing. Anyway, I re-pushed and it should be good now:

Pouyanpi · 2024-11-06T20:29:20Z

@yonromai would you please gpg sign your commits per CONTRIBUTING.md?

Thank you for taking a look @Pouyanpi

I thought I signed the previous commit; wondering if something went wrong when rebasing/force-pushing. Anyway, I re-pushed and it should be good now:

My bad, you are right, the auto update branch removed the signature! Thanks for signing it again.

@sklinglernv

* Add attention standard library with some basic tests. * Add additional tests and update attention library to latest version (from ACE). * Add section about supported LLMs * Fixes from @sklinglernv review * Fix documentation * Fix issue with undefined flow continuation not working with user intent generation * Fix issues with delayed restart of continuation flow * Add unit test * Update docs/colang_2/overview.rst Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Christian Schüller <160150754+schuellc-nvidia@users.noreply.github.com> * Fix a bug in a code block * Created missing changelogs group entries * Fix else if parsing problem * refactor: rename and update import paths in example bot * fix: resolve import path issue in config.py - Add guardrails_stdlib_path to colang_path_dirs - Improve error message for unresolved import paths * chore(deps): bump vllm in /nemoguardrails/library/patronusai Bumps [vllm](https://github.com/vllm-project/vllm) from 0.2.7 to 0.5.5. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Commits](vllm-project/vllm@v0.2.7...v0.5.5) --- updated-dependencies: - dependency-name: vllm dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump vllm in /nemoguardrails/library/llama_guard * remove redundant matching rules in escape_flow_name * Patronus Evaluate API Integration (NVIDIA-NeMo#834) * Patronus Evaluate API Integration * Address comments - tests will be added separately * Add missing tests * Remove print statements --------- Signed-off-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> * Add new test ChatInterface to run CLI like tests for the CLS example tests. * Fix event synchronization. * CLS tests core.co * CSL core.co tests done. Added UtteranceUserActionStarted to CLI and ChatInterface. * Fix wrong flow parameters in `core.co` * Tests for development helper flows. * Add test and examples for `timing.co`. * Add test and examples for `llmco`. * Fix CSL tests. * Add colang 2 documentation test to pytest.ini * Remove duplicated test. * Make tests more robust: no model config & update semaphore in current loop context. * Small improvements to CSL tests. * Few minor fixes to `attention.co` * Revert making the tracking user attention flow active. * Add Private AI Integration (NVIDIA-NeMo#815) * Update evaluate directory reference (NVIDIA-NeMo#751) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * doc: update similarity threshold to 0.75 and note (NVIDIA-NeMo#770) * fix: download nltk's punkt_tab in align_score Dockerfile (NVIDIA-NeMo#841) * feat(docs): enhance tracing configuration guide * refactor(config): remove OpenTelemetry from tracing config * docs(tracing): add Zipkin setup instructions * refactor(tracing): move log adapters initialization * feat(tracing): add global otel exporter registration * fix(dependencies): add pandas version constraint for eval Explicitly pinning the version of pandas to avoid pip resolution issues. This ensures compatibility with streamlit, which requires pandas>=1.4.0,<3. * docs(installation): add notice for dependency resolution fix style fix bold * Fix small issues in Colang 2 library examples * Merge in commit (23e27) from Colang doc repo and adjusted to relative Github paths * fix(tests): mock PromptSession to prevent console error The tests in `tests/test_cli.py` and `tests/test_cli_migration.py` were failing with `NoConsoleScreenBufferError` due to `prompt_toolkit` expecting a Windows console but finding `xterm-256color` instead. This issue occurs in Windows GitHub runners when the runner is `windows-latest`. To resolve this, `PromptSession` is mocked globally before any tests are collected, preventing the error. This fix ensures the tests run successfully in all environments. Changes: - Added `conftest.py` to mock `PromptSession` globally. * docs: update role from bot to assistant * docs(installation): update optional dependencies install * fix(docs): update pip install instructions note * fix: handle multiple output parsers in generation Updated the condition to check if `prompt_config.output_parser` is inthe list `["verbose_v1", "bot_message"]` * fix(docs): update CLI section headers from H4 to H3 * docs: update LLM support table to use Unicode symbols MyST and Sphinx cannot render the prev style * docs: update admonitions to use MyST syntax add docs: update syntax guide with admonitions adomination adomination adomination adomination * docs: remove duplicate GCP Text Moderation section * docs: specify shell syntax for CLI example * docs: update detailed logging example output * docs: update migration guide with new options * docs: update vulnerability scanning table to use unicode checkmarks * docs: update code blocks to use sh syntax highlighting * refactor(docs): change underscore to hyphens * refactor(docs): update references to new file names where we use hyphens instead of underscore fix fix fix * chore: update latest release version in README apply review fix * Fix release date in changelog * Update colang changelog * docs: add deprecation notice for Got It AI integration * docs: fix format for deprecation notice for Got It AI integration * docs: update deprecation notice format for Got It AI * wip: unused import * wip: use deepcopy to avoid repeated action side effect * wip: remove jailbreak from example output config * chore(changelog): update changelog for v0.11.0 release fix: update release date for version 0.10.0 * bump: update version to 0.11.0 * docs: update version to 0.11.0 * fix: move an entry to colang 2 changelog * fix: apply review changes * Add Colang patch fix note to changelog * fix(docs): update Garak GitHub links to NVIDIA repo * Restructure colang changelog adding contributor names * Minor changelog fix * Fix Colang name capitalization * Update entry * Replace all underscore with hyphen characters in folders and rst file names * Add migration cross reference * wip: switch to content moderation endpoint for factcheck * chore: correct date and PR number in changelog * Undo accidental changes * Fix asyncio loop issue in combination with enable_input event * feat: migrate to Poetry for dependency management This commit migrates the project from setuptools to Poetry for dependency management and packaging. The pyproject.toml file has been updated to reflect the new configuration, including dependencies, optional dependencies, and build system requirements. This change aims to simplify dependency management and improve the overall development workflow * chore: add tox configuration for multi-python testing * chore: add Makefile for common development tasks * chore: update .gitignore for better file management * chore: update Dockerfile to use Poetry for dependencies * chore: add Dockerfile for QA environment setup * ci: update GitLab CI for multi-python and Docker support * ci: remove redundant GitHub Actions workflows * ci: add reusable GitHub Actions workflow for tests * ci: add PR tests workflow for multi-python support * ci: add full-tests workflow for multi-OS and Python * ci: add build script for packaging with Poetry * ci: add GitHub Actions workflow for building and testing wheel chore(workflows): remove comments * ci: add workflow to test Docker image (not working) * refactor: rename test classes to supress pytest warning * fix: use temp directory for .railsignore in tests This commit updates the `test_railsignore.py` to use the system's temporary directory for the `.railsignore` file. This change addresses issues with tests on Windows OS by ensuring the `.railsignore` file is created in a writable location * chore: update issue templates with triage labels * chore: add documentation issue template * docs: update CONTRIBUTING.md for Poetry migration * feat(workflows): add lock closed threads workflow * feat(workflows): add test-published-dist workflow This workflow tests the published distribution of the package from PyPI daily. It sets up Python environments for versions 3.9, 3.10, and 3.11, installs the package, starts the server, and checks its status. This ensures the published package works as expected. * refactor: consolidate dependencies in pyproject.toml * feat(ci): update cron schedule to 11:00 PM UTC daily * chore(tox): add instructions for using pyenv with tox * fix(ci): remove image from registry if tests fail * wip: add factcheck doc * Fix typos in the example prompts to remove some of the IDE warnings * Fix GTP spelling * fix(ci): remove Ubuntu from full-tests * Fix attention test on Windows. * Update underscore folder names to new hyphen format * wip: add backward incompatibility warning to doc * chore: pin fastembed to 4.0.0 4.1.0 instroduces rust-pystemmer which does not have any license * chore: remove as it is not verified and approved by NVIDIA * fix(dependencies): change Python 3.9.7 exclusion format from supported versions * fix(dependencies): update tornado to 6.4.2 https://github.com/NVIDIA/NeMo-Guardrails/security/dependabot/66 * fix(dependencies): update aiohttp to version 3.11.9 https://github.com/NVIDIA/NeMo-Guardrails/security/dependabot/65 https://github.com/NVIDIA/NeMo-Guardrails/security/dependabot/64 * fix(dependencies): update black in dev deps https://github.com/NVIDIA/NeMo-Guardrails/security/dependabot/63 * url fix * fix checks * fix(ci): add missing event types for PR trigger fix * fix(ci): disable full tests on workflow changes * fix(dependencies): update lock file * fix activefence rail docs * Fix `test_repeating_timer` doc test. * Add return_value to FinishFlow internal event * Refactor return_value to context_update * Fix a bug * feat: add utility flow `wait until done` * test: add test for flow context update as part of with statements * Add Aegis 2.0 Guardrails connector, output parser, and documentation Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * Improve core flow * Add documentation * Simplified documentation example * Monkey patch nim_openai==vllm_openai Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * Updates based on MR discussion Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * fix(ci): add POETRY_VERSION variable and update cache * Fix 'nim' engine usage and corresponding documentation Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * code, example for topic guard connecter remove transformers dependency, make langchain use chat-openai for vllm add support for downloadable NIM * add documentation, add param chat_model for vllm/nim * remove max_tokens arg - not supported for NIMs * update references to nim_self_hosted * rebase with develop, refactor to use nim * add topic safety output restriction by default, add docs * chore: remove Ubuntu from full-tests workflow * chore: remove deprecated Got It AI integration (NVIDIA-NeMo#927) * Updated NemoGuard TopicControl documentation * chore(deps): bump jinja2 from 3.1.4 to 3.1.5 (NVIDIA-NeMo#916) * chore(deps): bump jinja2 from 3.1.4 to 3.1.5 Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.4...3.1.5) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> chore(desp): bump jinja2 from 3.1.4 to 3.1.5 * chore(deps): bump jinja2 from 3.1.4 to 3.1.5 in pyproject.toml --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: prezakhani <13303554+Pouyanpi@users.noreply.github.com> * Add model based jailbreak detection. Update JailbreakDetectionConfig to support embedding model. * Update Dockerfile-GPU to include embedding jailbreak detections * Change default model to Snowflake. Add env variable for nv-embedqa-e5-v5 model to Dockerfiles * SPDX in files * Add SPXD to __init__.py; update flows. * Fix logging messages in actions.py; Update example config to include embedding parameter * Add jailbreak model tests * Correct test config path * Make error message more useful, return the same value structure as NIM * Add tests. Refactor model-based detections to support only NemoGuard JailbreakDetect with snowflake embeddings. * Update dockerfile to pull models from HF * Update jailbreak docs * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Pouyan Rezakhani <prezakhani@nvidia.com> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Pouyan Rezakhani <prezakhani@nvidia.com> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Pouyan Rezakhani <prezakhani@nvidia.com> * Add additional skip conditions for jailbreak model setup * Rename Aegis to NemoGuard ContentSafety connector Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * Minor typo fix to make a link work Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> * Add einops to requirements.txt * fix: apply pre-commit hooks * docs: fix style fix doc titles * ci(workflows): update artifact handling in workflow * Add Nemoguard NIM blueprint (NVIDIA-NeMo#932) * NemoGuard NIM integration to NIM Blueprint Signed-off-by: Aditi Bodhankar <abodhankar@nvidia.com> * NeMo Guardrails integration into NIM Blueprint Signed-off-by: Aditi Bodhankar <abodhankar@nvidia.com> --------- Signed-off-by: Aditi Bodhankar <abodhankar@nvidia.com> * fix(docs): fix abdomination format and shorten title * chore: bump version to v0.11.1 fix(docs): update github tag url for v0.11.1 feat(pyproject.toml): add URLs and update dependencies chore: update changelog Update colang changelog --------- Signed-off-by: Christian Schüller <160150754+schuellc-nvidia@users.noreply.github.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Signed-off-by: Aditi Bodhankar <abodhankar@nvidia.com> Co-authored-by: Severin Klingler <sklingler@nvidia.com> Co-authored-by: sklinglernv <148848069+sklinglernv@users.noreply.github.com> Co-authored-by: Christian Schüller <cschueller@nvidia.com> Co-authored-by: Christian Schüller <160150754+schuellc-nvidia@users.noreply.github.com> Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Co-authored-by: Radin Shayanfar <radin.shayanfar@gmail.com> Co-authored-by: Chris Parisien <64271260+cparisien@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Varun Joshi <varun@patronus.ai> Co-authored-by: Kimi Li <kimi@autoalign.ai> Co-authored-by: Girish Sharma <girishsharma001@gmail.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Romain Yon <yonromai@users.noreply.github.com> Co-authored-by: Nikhil Varghese <nikhil@bot-it.ai> Co-authored-by: Krishna Sreeraj <krishna.sreeraj@thoughtworks.com> Co-authored-by: Naman Jain <j.naman.618@gmail.com> Co-authored-by: Noam Levy <noamlevy81@gmail.com> Co-authored-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: Pouyan Rezakhani <prezakhani@nvidia.com> Co-authored-by: Makesh Sreedhar <makeshn@nvidia.com> Co-authored-by: Traian Rebedea <trebedea@nvidia.com> Co-authored-by: Erick Galinkin <egalinkin@nvidia.com> Co-authored-by: Aditi Bodhankar <abodhankar@nvidia.com>

yonromai force-pushed the develop branch from 8f5c730 to 093cd6f Compare November 5, 2024 01:39

Pouyanpi force-pushed the develop branch from 093cd6f to e946117 Compare November 6, 2024 17:22

Pouyanpi self-requested a review November 6, 2024 17:58

Pouyanpi approved these changes Nov 6, 2024

View reviewed changes

Download nltk's punkt_tab in align_score Dockerfile

c7809a1

yonromai force-pushed the develop branch from e946117 to c7809a1 Compare November 6, 2024 19:40

Pouyanpi merged commit 4ce7daf into NVIDIA-NeMo:develop Nov 6, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Download nltk's punkt_tab in align_score Dockerfile #841

Download nltk's punkt_tab in align_score Dockerfile #841

Uh oh!

yonromai commented Nov 5, 2024 •

edited

Loading

Uh oh!

Pouyanpi left a comment

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

yonromai commented Nov 6, 2024

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Download nltk's punkt_tab in align_score Dockerfile #841

Download nltk's punkt_tab in align_score Dockerfile #841

Uh oh!

Conversation

yonromai commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Steps to reproduce the error

After this commit

Related Issue(s)

Checklist

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

yonromai commented Nov 6, 2024

Uh oh!

Pouyanpi commented Nov 6, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yonromai commented Nov 5, 2024 •

edited

Loading