Skip to content

Conversation

@dmsuehir
Copy link
Contributor

@dmsuehir dmsuehir commented Jun 5, 2024

Description

This LLM fine tuning workflow as originally published in the TLT repository, but it doesn't use the TLT API/CLI (it uses pytorch/hugging face code). The workflow includes a Dockerfile, helm chart, and a few different Helm values files. The Helm values are for LLM fine tuning with a few different use cases: 1) fine tuning a financial chatbot with a dataset loaded from a file 2) instruction tuning with a medical dataset from Hugging Face hub 3) a values file that is intended to be a template for someone who wants to fine tune a LLM with their own dataset/model.

The docker is already published at: intel/ai-workflows:torch-2.2.0-huggingface-multinode-py3.10. I've also tested this with 2.3 by building the pytorch multinode base from the main branch, then building the LLM workflow container with the updated 2.3 base. I had to add extra ENV vars to the pytorch multinode base in order for the distributed workflow to work in k8s for 2.3. These env vars would typically be set by the Torch CCL setvars.sh file, but those don't get applied in k8s, so they need to get set as ENV vars in the Dockerfile.

The test loops to check for a eval_results.json file in the mounted persistent volume claim, which would indicate that the training and evaluate have both completed.

Changes Made

  • Added env vars to the pytorch multinode dockerfile for Torch CCL paths
  • Added a workflows docker-compose.yaml file
  • Added the LLM fine tuning helm chart in /workflows/charts/training/huggingface_llm
  • The code follows the project's coding standards.
  • No Intel Internal IP is present within the changes.
  • The documentation has been updated to reflect any changes in functionality.

Validation

The helm chart can be tested using the tests/distilgpt2_values.yaml file which fine tunes distilgpt2 using the databricks-dolly-15k dataset for 5 steps and then evaluates the trained model with a subset of the dataset.

cd workflows/charts/training/huggingface_llm

helm install -f tests/distilgpt2_values.yaml llm-test .

helm test llm-test
  • I have tested any changes in container groups locally with test_runner.py with all existing tests passing, and I have added new tests where applicable.

dmsuehir added 2 commits June 5, 2024 14:15
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
@github-actions
Copy link

github-actions bot commented Jun 5, 2024

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

License Issues

workflows/charts/huggingface-llm/requirements.txt

PackageVersionLicenseIssue Type
mkl-include2023.2.0NullUnknown License
rouge_score0.1.2NullUnknown License
mkl2023.2.0NullUnknown License

OpenSSF Scorecard

Scorecard details
PackageVersionScoreDetails
pip/SentencePiece 0.2.0 🟢 7.5
Details
CheckScoreReason
Code-Review🟢 5Found 6/11 approved changesets -- score normalized to 5
Maintained🟢 1022 commit(s) and 17 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases🟢 42 out of the last 5 releases have a total of 2 signed artifacts.
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
SAST🟢 5SAST tool is not run on all commits -- score normalized to 5
Fuzzing🟢 10project is fuzzed
Security-Policy🟢 10security policy file detected
Pinned-Dependencies🟢 8dependency not pinned by hash detected -- score normalized to 8
Vulnerabilities🟢 100 existing vulnerabilities detected
pip/accelerate 0.30.1 🟢 6.3
Details
CheckScoreReason
Code-Review🟢 9Found 28/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 14 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Signed-Releases⚠️ -1no releases found
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
Packaging🟢 10packaging workflow detected
SAST🟢 4SAST tool is not run on all commits -- score normalized to 4
pip/datasets 2.19.0 🟢 5.8
Details
CheckScoreReason
Maintained🟢 1030 commit(s) and 12 issue activity found in the last 90 days -- score normalized to 10
Code-Review🟢 3Found 9/30 approved changesets -- score normalized to 3
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 10security policy file detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/einops 0.7.0 🟢 5
Details
CheckScoreReason
Code-Review⚠️ 2Found 4/20 approved changesets -- score normalized to 2
Maintained🟢 108 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/evaluate 0.4.2 🟢 5.4
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 55 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
SAST🟢 3SAST tool is not run on all commits -- score normalized to 3
pip/mkl 2023.2.0 UnknownUnknown
pip/mkl-include 2023.2.0 UnknownUnknown
pip/nltk 3.8.1 🟢 5.2
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 32 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 3
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 9security policy file detected
Packaging⚠️ -1packaging workflow not detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/onnxruntime 1.17.3 🟢 6.8
Details
CheckScoreReason
Code-Review🟢 10all last 30 commits are reviewed through GitHub
Maintained🟢 1030 commit(s) out of 30 and 8 issue activity out of 30 found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no badge detected
Vulnerabilities🟢 10no vulnerabilities detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1no published package detected
License🟢 10license file detected
Token-Permissions⚠️ 0non read-only tokens detected in GitHub workflows
Dependency-Update-Tool🟢 10update tool detected
Binary-Artifacts🟢 10no binaries found in the repo
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/onnxruntime-extensions 0.10.1 🟢 6.1
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 13 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
Binary-Artifacts🟢 7binaries present in source code
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/peft 0.11.1 UnknownUnknown
pip/protobuf 4.24.4 🟢 7.1
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
CI-Tests🟢 920 out of 21 merged PRs checked by a CI test -- score normalized to 9
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Code-Review🟢 4found 5 unreviewed changesets out of 9 -- score normalized to 4
Contributors🟢 1013 different organizations found -- score normalized to 10
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Dependency-Update-Tool🟢 10update tool detected
Fuzzing🟢 10project is fuzzed
License🟢 9license file detected
Maintained🟢 1030 commit(s) out of 30 and 9 issue activity out of 30 found in the last 90 days -- score normalized to 10
Packaging⚠️ -1no published package detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Security-Policy🟢 10security policy file detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Vulnerabilities🟢 73 existing vulnerabilities detected
pip/psutil 5.9.5 🟢 5.9
Details
CheckScoreReason
Code-Review🟢 3Found 9/30 approved changesets -- score normalized to 3
Maintained🟢 1028 commit(s) and 14 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Security-Policy🟢 10security policy file detected
Packaging⚠️ -1packaging workflow not detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing🟢 10project is fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/py-cpuinfo 9.0.0 🟢 3.8
Details
CheckScoreReason
Code-Review🟢 4Found 7/17 approved changesets -- score normalized to 4
Maintained⚠️ 00 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 0
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/rouge_score 0.1.2 UnknownUnknown
pip/tokenizers 0.19.1 🟢 5.5
Details
CheckScoreReason
Code-Review🟢 6Found 19/28 approved changesets -- score normalized to 6
Maintained🟢 1014 commit(s) and 10 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Security-Policy⚠️ 0security policy file not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Vulnerabilities🟢 73 existing vulnerabilities detected

Scanned Manifest Files

workflows/charts/huggingface-llm/requirements.txt
  • SentencePiece@0.2.0
  • accelerate@0.30.1
  • datasets@2.19.0
  • einops@0.7.0
  • evaluate@0.4.2
  • mkl@2023.2.0
  • mkl-include@2023.2.0
  • nltk@3.8.1
  • onnxruntime@1.17.3
  • onnxruntime-extensions@0.10.1
  • peft@0.11.1
  • protobuf@4.24.4
  • psutil@5.9.5
  • py-cpuinfo@9.0.0
  • rouge_score@0.1.2
  • tokenizers@0.19.1

Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need:

  • .actions.json file to enable build CI
  • tests.yaml file for container tests
  • Run pre-commit over your code
  • add yourself to CODEOWNERS under workflows/training
  • Fix any lint issues flagged

dmsuehir and others added 21 commits June 5, 2024 16:07
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
tylertitsworth
tylertitsworth previously approved these changes Jun 6, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this README to be included in our repo website or uploaded to dockerhub intel/ai-workflows?

If you want to add configs now, we can do that or in a future PR since you want update docs later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not on dockerhub for now because there are other container tags at intel/ai-workflows that we don't have in this table.

Not sure about that repo website, I will check with Ebi. If we do want to add it we can do a follow up PR.

@tylertitsworth tylertitsworth added this pull request to the merge queue Jun 6, 2024
@tylertitsworth tylertitsworth removed this pull request from the merge queue due to a manual request Jun 6, 2024
Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tylertitsworth tylertitsworth enabled auto-merge June 6, 2024 22:06
@tylertitsworth tylertitsworth added this pull request to the merge queue Jun 6, 2024
Merged via the queue into intel:main with commit fc9afb4 Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants