HuggingFaceModel #21

simple-easydev · 2024-04-11T15:00:54Z

Example schema:

HuggingFaceModel {
repo_id: "TheBloke/CapybaraHermes-2.5-Mistral-7B-GPTQ",
files: ["*.json", "model.safetensors"],
inference_devices: ["cpu"], // gpu, tpu, etc
quantization: "GPTQ", // GPTQ, AWQ, GGUF_Q4_0, etc
runtime: "llama.cpp", // vLLM, pytorch, etc
prompt_template: "chatml", // chatml, llama-2, gemma, etc.
}

I have established a list object named SUPPORTED_MODELS_V2 within the supported_model.py file.

If the model name exists in the SUPPORTED_MODELS_V2 list object, the system will employ the new HuggingFaceModel for downloading. Otherwise, it will resort to the old logic.

paka/utils.py

paka/kube_resources/model_group/models/abstract.py

jjleng · 2024-04-11T22:34:06Z

paka/kube_resources/model_group/models/abstract.py

+        logger.info(f"SHA256 hash of the file: {sha256_value}")
+        return upload_id, sha256_value
+
+    def upload_fs_to_s3(


This has a lot of duplicates with upload_to_s3. need consolidation

The upload_to_s3 method is currently unused for huggingface model, as it's specifically designed for transferring data from an HTTP stream generated by requests to an S3 bucket.

jjleng

…runtime

simple-easydev added 4 commits April 10, 2024 04:07

Draft models and tests

c582ac8

update huggleface model

e761491

Done first version of HuggingFaceModel

365927a

Fix tiny bugs

d9c19b7

simple-easydev self-assigned this Apr 11, 2024

simple-easydev requested a review from jjleng April 11, 2024 15:01

simple-easydev added the enhancement New feature or request label Apr 11, 2024

simple-easydev linked an issue Apr 11, 2024 that may be closed by this pull request

improve model abstraction and registry #20

Open

jjleng reviewed Apr 11, 2024

View reviewed changes

simple-easydev added 2 commits April 12, 2024 07:36

Fix feedbacks

ab22045

Fix missing feedback

4fc31b4

jjleng previously approved these changes Apr 12, 2024

View reviewed changes

jjleng and others added 15 commits April 13, 2024 00:07

[wip] gpu support

d40be9d

feat(gpu): run models on cuda GPUs

5af3e16

feat(gpu): make nvidia device plugin tolerate model group taints

615cc4d

feat(gpu): set n_gpu_layers to offload work to gpu for the llama.cpp …

8181314

…runtime

feat(gpu): larger disk for gpu nodes

91e4571

feat(gpu): make model group node disk size configerable

28075b7

feat(gpu): be able to request a number of GPUs through config

ac8c726

docs: update README with the GPU support message

a945de8

docs: add llama2 chat template for the invoice extraction example

62e9a62

docs: README for the invoice extraction example

c842495

docs(invoice_extraction): gpu_cluster.yaml for GPU inferences

ed40b64

feat: remove finalizers before tearing down a cluster

0aadc74

chore: bump version

4e2bdf7

docs: instructions for installing the pack CLI

6f88d8a

update the progress status logging for downloading

c1bcd37

simple-easydev dismissed jjleng’s stale review via c1bcd37 April 14, 2024 13:03

jjleng and others added 2 commits April 14, 2024 21:08

docs: add pulumi CLI as a dependency

a0f0ad4

Fix test case for HuggingFaceModel.upload_file_to_s3

5863ad0

jjleng approved these changes Apr 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFaceModel #21

HuggingFaceModel #21

simple-easydev commented Apr 11, 2024

jjleng Apr 11, 2024

simple-easydev Apr 11, 2024 •

edited

Loading

jjleng left a comment

HuggingFaceModel #21

Are you sure you want to change the base?

HuggingFaceModel #21

Conversation

simple-easydev commented Apr 11, 2024

jjleng Apr 11, 2024

Choose a reason for hiding this comment

simple-easydev Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

jjleng left a comment

Choose a reason for hiding this comment

simple-easydev Apr 11, 2024 •

edited

Loading