Skip to content

Conversation

@cpegeric
Copy link

@cpegeric cpegeric commented Nov 4, 2024

issue: #2

Ollama Client in Web Assembly

We have implemented three export functions such as 'chunk', 'embed' and 'generate'.

  1. 'chunk' function to cut the text into multiple chunks
Configuration parameters:
1. chunk_size = size of the chunk, default 1024
2. chunk_overlap, number of byte overlapping, default 20

Input:
string of text input

Output:
JSON Array of chunks
  1. 'embed' function to convert text into chunks of (embedding, chunk_text) pair.
Configuration parameters:
1. model, model name such as llama3.2
2. chunk_size = size of the chunk, default 1024
3. chunk_overlap, number of byte overlapping, default 20
4. address, address of the ollama host, default is http://localhost:11434.

Input:
input text in string format

Output:
JSON Array of Chunk with text and embedding. i.e.  [{"chunk":"chunk text 1", "embedding":[1,2....]},...]
  1. 'generate' function to generate result from prompt.
Configuration parameters:
1. model, model name such as llama3.2
2. address, address of the ollama host, default is http://localhost:11434.

Input:
prompt in string format

Output:
JSON Array of generated text

Integration with MatrixOne Database

mysql> create table chunk(t varchar, e vecf32(3072));

mysql> insert into chunk select json_unquote(json_extract(result, "$.chunk")), json_unquote(json_extract(result, "$.embedding")) from 
moplugin_table('https://github.com/matrixone/mojo/raw/main/plugin/ollama/ollama.wasm', 'embed', '{"model":"llama3.2"}', 'where is great wall?') as f;

mysql> select * from moplugin_table('https://github.com/matrixone/mojo/raw/main/plugin/ollama/ollama.wasm', 'generate', '{"model":"llama3.2"}', 'where is great
 wall?') as f;
| "The Great Wall of China is located in China, specifically along the northern border of the country. It stretches across several provinces and municipalities, including:\\n\\n1. Beijing Municipality (where the most famous and well-preserved sections are)\\n2. Tianjin Municipality\\n3. Hebei Province\\n4. Shanxi Province\\n5. Inner Mongolia Autonomous Region\\n\\nThe wall follows the mountain ranges of the northern China, winding its way through valleys and plains, and covers a total length of approximately 13,171 miles (21,196 km). It was built over several centuries to protect the Chinese Empire from invasions by nomadic tribes.\\n\\nSome popular locations to visit the Great Wall include:\\n\\n* Badaling Great Wall (near Beijing)\\n* Mutianyu Great Wall (also near Beijing)\\n* Jinshanling Great Wall (in Hebei Province)\\n* Simatai Great Wall (also in Hebei Province)\\n\\nIt's worth noting that while the wall is a UNESCO World Heritage Site and one of China's most famous landmarks, many sections have been destroyed or damaged over time due to natural erosion, human activities, or wars." |
1 row in set (3.85 sec)

Wikidump in Web Assembly

Two functions are implemented for wikidump files.

  1. To extract the offset and size of each bzip2 data chunks.
Function Name: 'get_index'
Input: The content of the bzip2 index file in []byte
Return:  JSON array with Object {"offset": offset, "size": size}
  1. To get all pages in bzip2 data chunk
Function Name: 'get_pages'
Input: Single bzip2 data chunk of the multi-stream wikidump file  in []byte
Return: JSON array with pages found in wikidump except redirected pages

To integrate with MatrixOne Database,

Just run the following SQL,


mysql> create stage mystage URL='file:///tmp/';

mysql> select * from wasm_run_table('https://github.com/matrixone/mojo/raw/main/plugin/wikidump/wikidump.wasm', 'get_index', null, cast('stage://mystage/wiki/index.txt.bz2' as datalink)) as f limit 5;
+--------------------------------------+
| result                               |
+--------------------------------------+
| {"offset": 557, "size": 707018}      |
| {"offset": 707575, "size": 1555786}  |
| {"offset": 2263361, "size": 1508609} |
| {"offset": 3771970, "size": 1011715} |
| {"offset": 4783685, "size": 1232327} |
+--------------------------------------+
5 rows in set (1.18 sec)


mysql> select json_extract(result, "$.revision.text") from wasm_run_table('https://github.com/matrixone/mojo/raw/main/plugin/wikidump/wikidump.wasm', 
'get_pages', null, cast('stage://mystage/wiki/wiki.bz2?offset=557&size=707018' as datalink)) as f;

...


mysql> drop stage mystage;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant