Feature: Support HuggingFace Text Generation including meta-llama2 model #266

kokokuo · 2023-08-02T07:58:01Z

Description

The Text Generation is one of the Natural Language Processing tasks supported by Hugging Face.

VulcanSQL supports the Using Text Generation by using the huggingface_text_generation filter. The result will be a string from huggingface_text_generation.

📢 Notice: The Text Generation default model is gpt2, If you would like to use the Meta LLama2 models, you have two method to do:

Subscribe to the Pro Account.

Set the Meta LLama2 model using the model keyword argument in huggingface_text_generation, e.g: meta-llama/Llama-2-13b-chat-hf.

Using Inference Endpoint.

Select one of the Meta LLama2 Models and deploy it to the Inference Endpoint.
Set the endpoint URL using the endpoint keyword argument in huggingface_text_generation.

Sample 1 - Subscribe to the Pro Account:

{% set data = [
  {
    "rank": 1,
    "institution": "Massachusetts Institute of Technology (MIT)",
    "location code":"US",
    "location":"United States"
  },
  {
    "rank": 2,
    "institution": "University of Cambridge",
    "location code":"UK",
    "location":"United Kingdom"
  },
  {
    "rank": 3,
    "institution": "Stanford University"
    "location code":"US",
    "location":"United States"
  }
  -- other universities.....
] %}

SELECT {{ data | huggingface_text_generation(query="Which university is the top-ranked university?", model="meta-llama/Llama-2-13b-chat-hf") }} as result

Sample 1 - Response:

[
  {
    "result": "Answer: Based on the provided list, the top-ranked university is Massachusetts Institute of Technology (MIT) with a rank of 1."
  }
]

Sample 2 - Using Inference Endpoint:

{% req universities %}
 SELECT rank,institution,"location code", "location" FROM read_csv_auto('2023-QS-World-University-Rankings.csv') 
{% endreq %}

SELECT {{ universities.value() | huggingface_text_generation(query="Which university located in the UK is ranked at the top of the list?", endpoint='xxx.yyy.zzz.huggingface.cloud') }} as result

Sample 2 - Response:

[
  {
    "result": "Answer: Based on the list provided, the top-ranked university in the UK is the University of Cambridge, which is ranked at number 2."
  }
]

Screenshot

SQL and API Schema

Question 1 - Which university is the top-ranked university?

Question 2 - Which university located in the UK is ranked at the top of the list?

Additional Context

Refactor the original TableQuestionAnsweringFilter function logistic to keep simple and readable.
Support endpoint field for making users could use their HuggingFace Inference Endpoint when using huggingface_xxx filter.
Move the original request method to the request.ts and support try-catch to bypass the Axios error message.
Refactor for moving the sample data to test-data folder for reusing data and use describe.
Create model.ts to define the common type or const value.
Skip the test case of testing llama2 model, because using llama2 model needs to subscribe Pro Account and pay $9/month.

…refactor huggingface filter logistic

- Add the "TextGenerationFilter". - support huggingface filters could pass "endpoint" keyword arguments when using different filter task. - add test cases of "TextGenerationFilter".

vercel · 2023-08-02T07:58:06Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
vulcan-sql-document	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 2, 2023 2:25pm

kokokuo · 2023-08-02T08:48:20Z

@cyyeh, Please assist me to check Document content - HuggingFace Text Generation, thanks!

Btw, I added the endpoint field in the Table Question Answering to support changing the API endpoint when using huggingface_xxxx filter.

onlyjackfrost

Besides some comments, others LGTM

onlyjackfrost · 2023-08-02T11:00:36Z

packages/extension-huggingface/src/lib/filters/tableQuestionAnswering.ts

  if (!(typeof args === 'object') || !has(args, 'query'))
    throw new InternalError('Must provide "query" keyword argument');
-  if (!args['query'])
-    throw new InternalError('The "query" argument must have value');


Curious about why we removed this query check.

Thanks for asking the question. I added back the logistic for checking "query" has value or not with test cases in 5e22e51

onlyjackfrost · 2023-08-02T11:05:11Z

packages/extension-huggingface/README.md

+
+Using the `huggingface_text_generation` filter. The result will be a string from `huggingface_text_generation`.
+
+**Notice**: The **Text Generation** default model is **gpt2**, If you would like to use the [Meta LLama2](https://huggingface.co/meta-llama) models, you have two method to do:


"If you would like to use the [Meta LLama2] models, you have two method to do"
Check the grammar.

Thanks for finding the grammar issue, I have fixed the method to methods at 5e22e51

onlyjackfrost · 2023-08-02T11:09:25Z

packages/extension-huggingface/README.md

+2. Select one of the [Meta LLama2](https://huggingface.co/meta-llama) Models and deploy it to the [Inference Endpoint](https://huggingface.co/inference-endpoints). Set the endpoint URL using the `endpoint` keyword argument in `huggingface_text_generation`.
+
+```sql
+SELECT {{ data | huggingface_text_generation(query="Which university is the top-ranked university?", endpoint='xxx.yyy.zzz.huggingface.cloud') }} as result


Maybe we can merge these to code snippet and use "comment" to describe the detail.
I think it will be more readable.

According to the code snippet, the marked code is older code. After discussion with @onlyjackfrost and checking, no need to change.

onlyjackfrost · 2023-08-02T11:13:37Z

packages/extension-huggingface/src/index.ts

 export default [
  HuggingFaceTableQuestionAnsweringFilterBuilder,
  HuggingFaceTableQuestionAnsweringFilterRunner,
+  TextGenerationFilterBuilder,
+  TextGenerationFilterRunner,


Please ensure that the naming is aligned with HuggingFace, either by using it as a prefix or without it.

Thanks for finding the naming issue, it has been fixed at 5e22e51

onlyjackfrost · 2023-08-02T11:23:22Z

packages/extension-huggingface/test/textGeneration.spec.ts

+    100 * 1000
+  );
+
+  // Skip the test case because the "meta-llama/Llama-2-13b-chat-hf" model need to upgrade your huggingface account to Pro Account by paying $9 per month


Is there any model that is free and can be used for testing?
If the structure of API response payload is the same, I think it could be used for testing.

After discussion, the free model has been added to the test cases and I renamed to Should not throw when passing the "query" argument by dynamic parameter through HuggingFace default recommended "gpt2" model 5e22e51

codecov-commenter · 2023-08-02T14:21:21Z

Codecov Report

Patch coverage: 80.76% and project coverage change: -0.03% ⚠️

Comparison is base (180f8a6) 90.25% compared to head (8c6a02c) 90.23%.

❗ Current head 8c6a02c differs from pull request most recent head 5e22e51. Consider uploading reports for the commit 5e22e51 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #266      +/-   ##
===========================================
- Coverage    90.25%   90.23%   -0.03%     
===========================================
  Files          346      344       -2     
  Lines         5931     5722     -209     
  Branches       794      769      -25     
===========================================
- Hits          5353     5163     -190     
+ Misses         421      405      -16     
+ Partials       157      154       -3

Flag	Coverage Δ
extension-driver-ksqldb	`?`
extension-huggingface	`86.25% <80.76%> (+0.53%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
...ges/extension-huggingface/src/lib/utils/request.ts	`55.55% <55.55%> (ø)`
...gingface/src/lib/filters/tableQuestionAnswering.ts	`88.46% <81.81%> (+7.90%)`	⬆️
...sion-huggingface/src/lib/filters/textGeneration.ts	`85.71% <85.71%> (ø)`
packages/extension-huggingface/src/index.ts	`100.00% <100.00%> (ø)`
packages/extension-huggingface/src/lib/model.ts	`100.00% <100.00%> (ø)`
...kages/extension-huggingface/src/lib/utils/index.ts	`100.00% <100.00%> (ø)`
...tension-huggingface/test/test-data/repositories.ts	`100.00% <100.00%> (ø)`

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…has value with test cases for huggingface filter - fix grammar in README. - fix the section of document . - add logistic for checking query has value with test cases

onlyjackfrost

LGTM

kokokuo added 2 commits July 31, 2023 18:06

feat(extension-huggingface): refactor for reusing common feature and …

75b8625

…refactor huggingface filter logistic

feat(extension-huggingface): support text generation task

48c37cb

- Add the "TextGenerationFilter". - support huggingface filters could pass "endpoint" keyword arguments when using different filter task. - add test cases of "TextGenerationFilter".

kokokuo changed the title ~~Feature: Support Huggingface Text Generation ( including llama2 model )~~ Feature: Support HuggingFace Text Generation including meta-llama2 model Aug 2, 2023

chore(extension-huggingface): update README and add document

e192990

kokokuo force-pushed the feature/huggingface-text-generation branch from 7c0cc16 to e192990 Compare August 2, 2023 08:07

vercel bot deployed to Preview August 2, 2023 08:07 View deployment

kokokuo requested review from onlyjackfrost and cyyeh August 2, 2023 08:45

onlyjackfrost reviewed Aug 2, 2023

View reviewed changes

vercel bot deployed to Preview August 2, 2023 14:17 View deployment

kokokuo force-pushed the feature/huggingface-text-generation branch from 4c445cd to 8c6a02c Compare August 2, 2023 14:23

vercel bot deployed to Preview August 2, 2023 14:23 View deployment

chore(extension-huggingface): add logistic of check checking "query" …

5e22e51

…has value with test cases for huggingface filter - fix grammar in README. - fix the section of document . - add logistic for checking query has value with test cases

kokokuo force-pushed the feature/huggingface-text-generation branch from 8c6a02c to 5e22e51 Compare August 2, 2023 14:25

vercel bot deployed to Preview August 2, 2023 14:25 View deployment

onlyjackfrost approved these changes Aug 3, 2023

View reviewed changes

kokokuo merged commit 5df01ae into develop Aug 3, 2023

kokokuo mentioned this pull request Aug 7, 2023

Feature: add the example for using HuggingFace text generation filter Canner/vulcan-sql-examples#36

Merged

hanshino deleted the feature/huggingface-text-generation branch January 31, 2024 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Support HuggingFace Text Generation including meta-llama2 model #266

Feature: Support HuggingFace Text Generation including meta-llama2 model #266

kokokuo commented Aug 2, 2023 •

edited

Loading

vercel bot commented Aug 2, 2023 •

edited

Loading

kokokuo commented Aug 2, 2023

onlyjackfrost left a comment

onlyjackfrost Aug 2, 2023

kokokuo Aug 2, 2023 •

edited

Loading

onlyjackfrost Aug 2, 2023

kokokuo Aug 2, 2023 •

edited

Loading

onlyjackfrost Aug 2, 2023

kokokuo Aug 2, 2023

onlyjackfrost Aug 2, 2023

kokokuo Aug 2, 2023 •

edited

Loading

onlyjackfrost Aug 2, 2023

kokokuo Aug 2, 2023 •

edited

Loading

codecov-commenter commented Aug 2, 2023 •

edited

Loading

onlyjackfrost left a comment


		Using the `huggingface_text_generation` filter. The result will be a string from `huggingface_text_generation`.

		Notice: The Text Generation default model is gpt2, If you would like to use the [Meta LLama2](https://huggingface.co/meta-llama) models, you have two method to do:

Feature: Support HuggingFace Text Generation including meta-llama2 model #266

Feature: Support HuggingFace Text Generation including meta-llama2 model #266

Conversation

kokokuo commented Aug 2, 2023 • edited Loading

Description

Screenshot

Additional Context

vercel bot commented Aug 2, 2023 • edited Loading

kokokuo commented Aug 2, 2023

onlyjackfrost left a comment

Choose a reason for hiding this comment

onlyjackfrost Aug 2, 2023

Choose a reason for hiding this comment

kokokuo Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

onlyjackfrost Aug 2, 2023

Choose a reason for hiding this comment

kokokuo Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

onlyjackfrost Aug 2, 2023

Choose a reason for hiding this comment

kokokuo Aug 2, 2023

Choose a reason for hiding this comment

onlyjackfrost Aug 2, 2023

Choose a reason for hiding this comment

kokokuo Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

onlyjackfrost Aug 2, 2023

Choose a reason for hiding this comment

kokokuo Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Aug 2, 2023 • edited Loading

Codecov Report

onlyjackfrost left a comment

Choose a reason for hiding this comment

kokokuo commented Aug 2, 2023 •

edited

Loading

vercel bot commented Aug 2, 2023 •

edited

Loading

kokokuo Aug 2, 2023 •

edited

Loading

kokokuo Aug 2, 2023 •

edited

Loading

kokokuo Aug 2, 2023 •

edited

Loading

kokokuo Aug 2, 2023 •

edited

Loading

codecov-commenter commented Aug 2, 2023 •

edited

Loading