|
| 1 | +--- |
| 2 | +authors: |
| 3 | + - name: Eason Kuo |
| 4 | + title: contributor of VulcanSQL |
| 5 | + url: https://github.com/kokokuo |
| 6 | + image_url: https://avatars.githubusercontent.com/u/5389253?v=4 |
| 7 | + email: eason.kuo@cannerdata.com |
| 8 | + - name: Jimmy Yeh |
| 9 | + title: contributor of VulcanSQL |
| 10 | + url: https://github.com/cyyeh |
| 11 | + image_url: https://avatars.githubusercontent.com/u/11023068?v=4 |
| 12 | + email: jimmy.yeh@cannerdata.com |
| 13 | +--- |
| 14 | + |
| 15 | +# Querying Your Data Easily and Smartly through Hugging Face |
| 16 | + |
| 17 | +*TLDR: VulcanSQL, a free and open-source data API framework built specifically for data applications, |
| 18 | +empowers data professionals to generate and distribute data APIs quickly and effortlessly. |
| 19 | +It takes your SQL templates and transforms them into data APIs, with no backend expertise necessary.* |
| 20 | + |
| 21 | +## Preface |
| 22 | + |
| 23 | +Normally, in order to retrieve the data we need from a data source, |
| 24 | +we have to write SQL statements. However, this process could be **time-consuming** |
| 25 | +especially when the data consumers have different requirements in short time. |
| 26 | +Now, to make this process easier and more flexible to data consumers, |
| 27 | +VulcanSQL has **integrated HuggingFace inference capabilities**. This allows us to **reduce |
| 28 | +the need for changing SQL templates** by simply allowing data consumers to ask questions |
| 29 | +and getting the results they need. |
| 30 | + |
| 31 | +<!--truncate--> |
| 32 | + |
| 33 | +## VulcanSQL HuggingFace Filters |
| 34 | + |
| 35 | +VulcanSQL leverages the Hugging Face Inference feature through the VulcanSQL |
| 36 | +[Filters](https://vulcansql.com/docs/develop/advanced#filters) statement. |
| 37 | + |
| 38 | +### What is Hugging Face |
| 39 | + |
| 40 | +Hugging Face is an AI community that builds tools to enable users to build, train, |
| 41 | +and deploy machine learning models. Hugging Face makes it easy to share tools, models, |
| 42 | +model weights, and datasets among other practitioners through its toolkit. |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +Hugging Face provides the [Inference API](https://huggingface.co/inference-api) feature |
| 47 | +that allows users to run pre-trained AI models for various natural language processing (NLP) |
| 48 | +tasks, making it easier to integrate powerful language models into applications and services. |
| 49 | + |
| 50 | + |
| 51 | + |
| 52 | +### Table Question Answering Task Filter |
| 53 | + |
| 54 | +"[Table Question Answering](https://huggingface.co/tasks/table-question-answering)" is one of |
| 55 | +the NLP (Natural Language Processing) tasks provided by Hugging Face. Table Question Answering |
| 56 | +involves answering a question about the information in a given table. It allows for simulating SQL |
| 57 | +execution by inputting a table through its model. |
| 58 | + |
| 59 | +VulcanSQL currently integrates the table question answering feature by creating the filter named |
| 60 | +`huggingface_table_question_answering` and allows you to apply functions to variables using the |
| 61 | +pipe operator (`|`). |
| 62 | + |
| 63 | +**Sample 1 - send the data from the variable [`set` tag](https://www.notion.so/VulcanSQL-edb87d04de074125ab19275e6f63d844?pvs=21):** |
| 64 | + |
| 65 | +You could give the dataset with the **[`set` tag](https://www.notion.so/VulcanSQL-edb87d04de074125ab19275e6f63d844?pvs=21)** |
| 66 | +and give the question with the `query` field: |
| 67 | + |
| 68 | +```sql |
| 69 | +{% set data = [ |
| 70 | + { |
| 71 | + "repository": "vulcan-sql", |
| 72 | + "topic": ["analytics", "data-lake", "data-warehouse", "api-builder"], |
| 73 | + "description":"Create and share Data APIs fast! Data API framework for DuckDB, ClickHouse, Snowflake, BigQuery, PostgreSQL" |
| 74 | + }, |
| 75 | + { |
| 76 | + "repository": "accio", |
| 77 | + "topic": ["data-analytics", "data-lake", "data-warehouse", "bussiness-intelligence"], |
| 78 | + "description": "Query Your Data Warehouse Like Exploring One Big View." |
| 79 | + }, |
| 80 | + { |
| 81 | + "repository": "hello-world", |
| 82 | + "topic": [], |
| 83 | + "description": "Sample repository for testing" |
| 84 | + } |
| 85 | +] %} |
| 86 | + |
| 87 | +-- The source data for "huggingface_table_question_answering" needs to be an array of objects. |
| 88 | +SELECT {{ data | huggingface_table_question_answering(query="How many repositories related to data-lake topic?") }} as result |
| 89 | +``` |
| 90 | + |
| 91 | +Here is a response returned by `huggingface_table_question_answering`: |
| 92 | + |
| 93 | +```json |
| 94 | +[ |
| 95 | + { |
| 96 | + "result": "{\"answer\":\"COUNT > vulcan-sql, accio\",\"coordinates\":[[0,0],[1,0]],\"cells\":[\"vulcan-sql\",\"accio\"],\"aggregator\":\"COUNT\"}" |
| 97 | + } |
| 98 | +] |
| 99 | +``` |
| 100 | + |
| 101 | +The result will be converted to a JSON string from `huggingface_table_question_answering`. |
| 102 | +You could decompress the JSON string and use the result by yourself. |
| 103 | + |
| 104 | +**Sample 2 - send the data from the `req` tag:** |
| 105 | + |
| 106 | +You could also use the `req` tag to keep the query result from the previous SQL condition and save it |
| 107 | +to a variable named `repositories`. Then you can use `.value()` to get the data result and |
| 108 | +pass it to `huggingface_table_question_answering` with the pipe operator `|`. |
| 109 | + |
| 110 | +```sql |
| 111 | +{% req repositories %} |
| 112 | + SELECT * FROM read_csv_auto('Top200StaredRepositories.csv') |
| 113 | +{% endreq %} |
| 114 | + |
| 115 | +{% set question = context.params.question %} |
| 116 | + |
| 117 | +SELECT {{ repositories.value() | huggingface_table_question_answering(query=question, wait_for_model=true) }} as result |
| 118 | +``` |
| 119 | + |
| 120 | +You may see we also pass the value `true` to the `wait_for_model` field, |
| 121 | +which means waiting for the HuggingFace table question answering to load the pre-trained model; otherwise it may fail due to the model not being loaded completely. |
| 122 | +For more information, please see the [VulcanSQL's Hugging Face Table Question Answering Filter Extension Documentation](https://vulcansql.com/docs/extensions/huggingface/huggingface-table-question-answering). |
| 123 | + |
| 124 | +Now we could request API with different questions by the parameter `question` and get different results! |
| 125 | + |
| 126 | +**Scenario 1** - We asked `Find the repository has the most stars?`, and the Hugging Face model told us that freeCodeCamp has the most stars. |
| 127 | + |
| 128 | + |
| 129 | + |
| 130 | +**Scenario 2** - We asked `How many repositories use Python language? Give repository name`, |
| 131 | +and the Hugging Face model told us awesome-python, httpie and thefuck are using Python language. |
| 132 | + |
| 133 | + |
| 134 | + |
| 135 | +As you see, after using the **HuggingFace Table Answering Filter, the benefit is:** |
| 136 | + |
| 137 | +- **Don't need** to change the SQL template file. |
| 138 | +- **Don't need** to re-build the SQL template file. |
| 139 | +- **Don't need** to create another SQL template file to satisfy two query scenarios. |
| 140 | + |
| 141 | +:::info |
| 142 | +The quality of response depends on the model used in the HuggingFace Table Answering Filter and how we ask the questions to the model. |
| 143 | +::: |
| 144 | + |
| 145 | +However, VulcanSQL not only provides the Table Question Answering Filter feature |
| 146 | +but also publishes a popular Text Generation Filter for the Hugging Face Filters Extension. |
| 147 | + |
| 148 | +It means you could use the popular models [Meta Llama2](https://ai.meta.com/llama/)! 🥳 |
| 149 | + |
| 150 | +### Text Generation Task Filter |
| 151 | + |
| 152 | +"[Text Generation](https://huggingface.co/tasks/text-generation)" is another NLP |
| 153 | +(Natural Language Processing) task provided by Hugging Face. Text generation means producing |
| 154 | +new text. These models can, for example, fill in incomplete text or paraphrase, |
| 155 | +or even answer your question according based on your input context. |
| 156 | + |
| 157 | +VulcanSQL also integrates the text generation feature by creating the filter named |
| 158 | +`huggingface_text_generation` and allows you to apply functions to variables using |
| 159 | +the pipe operator (`|`). |
| 160 | + |
| 161 | +Besides, Hugging Face provides the popular [Meta Llama2](https://huggingface.co/meta-llama) models, |
| 162 | +a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to |
| 163 | +70 billion parameters. |
| 164 | + |
| 165 | + |
| 166 | + |
| 167 | +In the next sample, we are going to demonstrate how VulcanSQL uses the `huggingface_text_generation` filter |
| 168 | +with the Llama2 model `meta-llama/Llama-2-13b-chat-hf` to answer your question. |
| 169 | + |
| 170 | +:::info |
| 171 | +If you would like to use the Meta Llama2 model, you have at least two options to choose from: |
| 172 | + |
| 173 | +1. Subscribe to the [Hugging Face Pro Account](https://huggingface.co/pricing#pro). |
| 174 | +2. Use [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints). |
| 175 | + |
| 176 | +For more information, please see [VulcanSQL's Text Generation Filter Extension Documentation](https://vulcansql.com/docs/extensions/huggingface/huggingface-text-generation). |
| 177 | +::: |
| 178 | + |
| 179 | +**Sample - send the data with the `req` tag:** |
| 180 | + |
| 181 | +The sample uses the HuggingFace access token of the [Pro Account](https://huggingface.co/pricing#pro) to get |
| 182 | +the result by using the `meta-llama/Llama-2-13b-chat-hf` model. |
| 183 | + |
| 184 | +```sql |
| 185 | +-- Using the `meta-llama/Llama-2-13b-chat-hf` model, data must have less than 4096 tokens, so need to limit data row and column for universities. |
| 186 | +{% req universities %} |
| 187 | + SELECT rank, institution, "location code", "location" FROM read_csv_auto('2023-QS-World-University-Rankings.csv') LIMIT 100 |
| 188 | +{% endreq %} |
| 189 | + |
| 190 | +{% set question = context.params.question %} |
| 191 | + |
| 192 | +SELECT {{ universities.value() | huggingface_text_generation(query=question, model="meta-llama/Llama-2-13b-chat-hf", wait_for_model=true) }} as result |
| 193 | +``` |
| 194 | + |
| 195 | +**Scenario 1** - We asked `Which university is the top-ranked university?`, and the model gave us the top-ranked university. |
| 196 | + |
| 197 | + |
| 198 | + |
| 199 | +**Scenario 2** - We asked `Which university located in the UK is ranked at the top of the list?`, |
| 200 | +and the model gave us the top-ranked university that is located in the UK. |
| 201 | + |
| 202 | + |
| 203 | + |
| 204 | +Wow! It's really amazing that the **HuggingFace Text Generation Filter** can answer your question |
| 205 | +based on the given dataset! |
| 206 | + |
| 207 | +## Conclusion |
| 208 | + |
| 209 | +With more and more great machine learning models coming out, it's great that we can utilize their power to make our daily work easier! |
| 210 | +We hope this blog post can give you a glimpse on how VulcanSQL can be involved in this revolutionary event in human history! |
| 211 | + |
| 212 | +Imaging a world that you can deliver APIs that users only need to query questions they have, the model handles SQL logic for you, |
| 213 | +and VulcanSQL takes care of the [data privacy](../docs/data-privacy/overview) and [API things](../docs/api-plugin/overview). It sounds exciting, isn't it? |
| 214 | + |
| 215 | +In the near future, we'll publish detailed step-by-step guides to help you write your own AI-enabled filter extensions! Stay tuned! |
| 216 | + |
| 217 | + |
0 commit comments