Skip to content

Commit 0be3780

Browse files
authored
Merge pull request #281 from cyyeh/feature/update-doc
update docs
2 parents df1ba38 + 5a0eb4a commit 0be3780

13 files changed

+283
-1
lines changed

packages/doc/blog/powering-rapid-data-apps-with-vulcansql.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ authors:
99

1010
# Powering Rapid Data Applications Using Your Data Warehouse With VulcanSQL
1111

12-
![cover](./static/cover-powering-rapid-data-apps-with-vulcansql.png)
12+
![cover](./static/cover-powering-rapid-data-apps-with-vulcansql.jpg)
1313

1414
Hello, data folks.
1515

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
---
2+
authors:
3+
- name: Eason Kuo
4+
title: contributor of VulcanSQL
5+
url: https://github.com/kokokuo
6+
image_url: https://avatars.githubusercontent.com/u/5389253?v=4
7+
email: eason.kuo@cannerdata.com
8+
- name: Jimmy Yeh
9+
title: contributor of VulcanSQL
10+
url: https://github.com/cyyeh
11+
image_url: https://avatars.githubusercontent.com/u/11023068?v=4
12+
email: jimmy.yeh@cannerdata.com
13+
---
14+
15+
# Querying Your Data Easily and Smartly through Hugging Face
16+
17+
*TLDR: VulcanSQL, a free and open-source data API framework built specifically for data applications,
18+
empowers data professionals to generate and distribute data APIs quickly and effortlessly.
19+
It takes your SQL templates and transforms them into data APIs, with no backend expertise necessary.*
20+
21+
## Preface
22+
23+
Normally, in order to retrieve the data we need from a data source,
24+
we have to write SQL statements. However, this process could be **time-consuming**
25+
especially when the data consumers have different requirements in short time.
26+
Now, to make this process easier and more flexible to data consumers,
27+
VulcanSQL has **integrated HuggingFace inference capabilities**. This allows us to **reduce
28+
the need for changing SQL templates** by simply allowing data consumers to ask questions
29+
and getting the results they need.
30+
31+
<!--truncate-->
32+
33+
## VulcanSQL HuggingFace Filters
34+
35+
VulcanSQL leverages the Hugging Face Inference feature through the VulcanSQL
36+
[Filters](https://vulcansql.com/docs/develop/advanced#filters) statement.
37+
38+
### What is Hugging Face
39+
40+
Hugging Face is an AI community that builds tools to enable users to build, train,
41+
and deploy machine learning models. Hugging Face makes it easy to share tools, models,
42+
model weights, and datasets among other practitioners through its toolkit.
43+
44+
![img5](./static/querying-your-data-easily-and-smartly-through-huggingface/img5.png)
45+
46+
Hugging Face provides the [Inference API](https://huggingface.co/inference-api) feature
47+
that allows users to run pre-trained AI models for various natural language processing (NLP)
48+
tasks, making it easier to integrate powerful language models into applications and services.
49+
50+
![img6](./static/querying-your-data-easily-and-smartly-through-huggingface/img6.png)
51+
52+
### Table Question Answering Task Filter
53+
54+
"[Table Question Answering](https://huggingface.co/tasks/table-question-answering)" is one of
55+
the NLP (Natural Language Processing) tasks provided by Hugging Face. Table Question Answering
56+
involves answering a question about the information in a given table. It allows for simulating SQL
57+
execution by inputting a table through its model.
58+
59+
VulcanSQL currently integrates the table question answering feature by creating the filter named
60+
`huggingface_table_question_answering` and allows you to apply functions to variables using the
61+
pipe operator (`|`).
62+
63+
**Sample 1 - send the data from the variable [`set` tag](https://www.notion.so/VulcanSQL-edb87d04de074125ab19275e6f63d844?pvs=21):**
64+
65+
You could give the dataset with the **[`set` tag](https://www.notion.so/VulcanSQL-edb87d04de074125ab19275e6f63d844?pvs=21)**
66+
and give the question with the `query` field:
67+
68+
```sql
69+
{% set data = [
70+
{
71+
"repository": "vulcan-sql",
72+
"topic": ["analytics", "data-lake", "data-warehouse", "api-builder"],
73+
"description":"Create and share Data APIs fast! Data API framework for DuckDB, ClickHouse, Snowflake, BigQuery, PostgreSQL"
74+
},
75+
{
76+
"repository": "accio",
77+
"topic": ["data-analytics", "data-lake", "data-warehouse", "bussiness-intelligence"],
78+
"description": "Query Your Data Warehouse Like Exploring One Big View."
79+
},
80+
{
81+
"repository": "hello-world",
82+
"topic": [],
83+
"description": "Sample repository for testing"
84+
}
85+
] %}
86+
87+
-- The source data for "huggingface_table_question_answering" needs to be an array of objects.
88+
SELECT {{ data | huggingface_table_question_answering(query="How many repositories related to data-lake topic?") }} as result
89+
```
90+
91+
Here is a response returned by `huggingface_table_question_answering`:
92+
93+
```json
94+
[
95+
{
96+
"result": "{\"answer\":\"COUNT > vulcan-sql, accio\",\"coordinates\":[[0,0],[1,0]],\"cells\":[\"vulcan-sql\",\"accio\"],\"aggregator\":\"COUNT\"}"
97+
}
98+
]
99+
```
100+
101+
The result will be converted to a JSON string from `huggingface_table_question_answering`.
102+
You could decompress the JSON string and use the result by yourself.
103+
104+
**Sample 2 - send the data from the `req` tag:**
105+
106+
You could also use the `req` tag to keep the query result from the previous SQL condition and save it
107+
to a variable named `repositories`. Then you can use `.value()` to get the data result and
108+
pass it to `huggingface_table_question_answering` with the pipe operator `|`.
109+
110+
```sql
111+
{% req repositories %}
112+
SELECT * FROM read_csv_auto('Top200StaredRepositories.csv')
113+
{% endreq %}
114+
115+
{% set question = context.params.question %}
116+
117+
SELECT {{ repositories.value() | huggingface_table_question_answering(query=question, wait_for_model=true) }} as result
118+
```
119+
120+
You may see we also pass the value `true` to the `wait_for_model` field,
121+
which means waiting for the HuggingFace table question answering to load the pre-trained model; otherwise it may fail due to the model not being loaded completely.
122+
For more information, please see the [VulcanSQL's Hugging Face Table Question Answering Filter Extension Documentation](https://vulcansql.com/docs/extensions/huggingface/huggingface-table-question-answering).
123+
124+
Now we could request API with different questions by the parameter `question` and get different results!
125+
126+
**Scenario 1** - We asked `Find the repository has the most stars?`, and the Hugging Face model told us that freeCodeCamp has the most stars.
127+
128+
![img0](static/querying-your-data-easily-and-smartly-through-huggingface/img0.png)
129+
130+
**Scenario 2** - We asked `How many repositories use Python language? Give repository name`,
131+
and the Hugging Face model told us awesome-python, httpie and thefuck are using Python language.
132+
133+
![img1](static/querying-your-data-easily-and-smartly-through-huggingface/img1.png)
134+
135+
As you see, after using the **HuggingFace Table Answering Filter, the benefit is:**
136+
137+
- **Don't need** to change the SQL template file.
138+
- **Don't need** to re-build the SQL template file.
139+
- **Don't need** to create another SQL template file to satisfy two query scenarios.
140+
141+
:::info
142+
The quality of response depends on the model used in the HuggingFace Table Answering Filter and how we ask the questions to the model.
143+
:::
144+
145+
However, VulcanSQL not only provides the Table Question Answering Filter feature
146+
but also publishes a popular Text Generation Filter for the Hugging Face Filters Extension.
147+
148+
It means you could use the popular models [Meta Llama2](https://ai.meta.com/llama/)! 🥳
149+
150+
### Text Generation Task Filter
151+
152+
"[Text Generation](https://huggingface.co/tasks/text-generation)" is another NLP
153+
(Natural Language Processing) task provided by Hugging Face. Text generation means producing
154+
new text. These models can, for example, fill in incomplete text or paraphrase,
155+
or even answer your question according based on your input context.
156+
157+
VulcanSQL also integrates the text generation feature by creating the filter named
158+
`huggingface_text_generation` and allows you to apply functions to variables using
159+
the pipe operator (`|`).
160+
161+
Besides, Hugging Face provides the popular [Meta Llama2](https://huggingface.co/meta-llama) models,
162+
a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to
163+
70 billion parameters.
164+
165+
![img2](static/querying-your-data-easily-and-smartly-through-huggingface/img2.png)
166+
167+
In the next sample, we are going to demonstrate how VulcanSQL uses the `huggingface_text_generation` filter
168+
with the Llama2 model `meta-llama/Llama-2-13b-chat-hf` to answer your question.
169+
170+
:::info
171+
If you would like to use the Meta Llama2 model, you have at least two options to choose from:
172+
173+
1. Subscribe to the [Hugging Face Pro Account](https://huggingface.co/pricing#pro).
174+
2. Use [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints).
175+
176+
For more information, please see [VulcanSQL's Text Generation Filter Extension Documentation](https://vulcansql.com/docs/extensions/huggingface/huggingface-text-generation).
177+
:::
178+
179+
**Sample - send the data with the `req` tag:**
180+
181+
The sample uses the HuggingFace access token of the [Pro Account](https://huggingface.co/pricing#pro) to get
182+
the result by using the `meta-llama/Llama-2-13b-chat-hf` model.
183+
184+
```sql
185+
-- Using the `meta-llama/Llama-2-13b-chat-hf` model, data must have less than 4096 tokens, so need to limit data row and column for universities.
186+
{% req universities %}
187+
SELECT rank, institution, "location code", "location" FROM read_csv_auto('2023-QS-World-University-Rankings.csv') LIMIT 100
188+
{% endreq %}
189+
190+
{% set question = context.params.question %}
191+
192+
SELECT {{ universities.value() | huggingface_text_generation(query=question, model="meta-llama/Llama-2-13b-chat-hf", wait_for_model=true) }} as result
193+
```
194+
195+
**Scenario 1** - We asked `Which university is the top-ranked university?`, and the model gave us the top-ranked university.
196+
197+
![img3](static/querying-your-data-easily-and-smartly-through-huggingface/img3.png)
198+
199+
**Scenario 2** - We asked `Which university located in the UK is ranked at the top of the list?`,
200+
and the model gave us the top-ranked university that is located in the UK.
201+
202+
![img4](static/querying-your-data-easily-and-smartly-through-huggingface/img4.png)
203+
204+
Wow! It's really amazing that the **HuggingFace Text Generation Filter** can answer your question
205+
based on the given dataset!
206+
207+
## Conclusion
208+
209+
With more and more great machine learning models coming out, it's great that we can utilize their power to make our daily work easier!
210+
We hope this blog post can give you a glimpse on how VulcanSQL can be involved in this revolutionary event in human history!
211+
212+
Imaging a world that you can deliver APIs that users only need to query questions they have, the model handles SQL logic for you,
213+
and VulcanSQL takes care of the [data privacy](../docs/data-privacy/overview) and [API things](../docs/api-plugin/overview). It sounds exciting, isn't it?
214+
215+
In the near future, we'll publish detailed step-by-step guides to help you write your own AI-enabled filter extensions! Stay tuned!
216+
217+
Loading
Binary file not shown.
Loading
Loading
Loading
Loading
Loading
Loading
Loading

packages/doc/docs/extensions/api.mdx

+60
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# The API Extension
2+
3+
This extension allows you to call RESTful APIs to other sources, and you can either request, update, or even delete resources to other sources!
4+
5+
## Installation
6+
7+
We need to install an additional package in order to call RESTful APIs:
8+
9+
1. Install package(If you have VulcanSQL in the binary version, you can skip this step)
10+
11+
```bash
12+
npm i @vulcan-sql/extension-api-caller
13+
```
14+
2. Setup `vulcan.yaml`
15+
16+
```yaml
17+
extensions:
18+
api: '@vulcan-sql/extension-api-caller' # add this line
19+
```
20+
21+
## Using the API extension
22+
23+
The minimum example
24+
25+
```sql
26+
SELECT {{ {} | rest_api(url='https://dummyjson.com/products/1') }}
27+
```
28+
29+
To pass the path parameters
30+
31+
```sql
32+
{% set a_variable_you_can_define = { "path": { "id": 1 } } %}
33+
SELECT {{ a_variable_you_can_define | rest_api(url='https://dummyjson.com/products/:id') }}
34+
```
35+
36+
To pass the query parameters
37+
38+
```sql
39+
{% set a_variable_you_can_define = { "query": { "q": "phone" } } %}
40+
SELECT {{ a_variable_you_can_define | rest_api(url='https://dummyjson.com/products/search') }}
41+
```
42+
43+
To issue the POST request
44+
45+
```sql
46+
{% set a_variable_you_can_define = { "body": { "title": "BMW Pencil" } } %}
47+
SELECT {{ a_variable_you_can_define | rest_api(url='https://dummyjson.com/products/add', method='POST') }}
48+
```
49+
50+
To pass the headers and multiple fields
51+
52+
```sql
53+
{% set a_variable_you_can_define = { "headers": { "Content-Type": "application/json" }, "body": { "title": "BMW Pencil" } } %}
54+
SELECT {{ a_variable_you_can_define | rest_api(url='https://dummyjson.com/products/add', method='POST') }}
55+
```
56+
57+
## Examples
58+
59+
You can check out this [restapi-caller](https://github.com/Canner/vulcan-sql-examples/tree/main/restapi-caller) example for further details!
60+

packages/doc/sidebars.js

+5
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,11 @@ const sidebars = {
182182
},
183183
]
184184
},
185+
{
186+
type: 'doc',
187+
label: 'API',
188+
id: 'extensions/api',
189+
},
185190
// {
186191
// type: 'category',
187192
// label: 'Overview',

0 commit comments

Comments
 (0)