Skip to content

Commit

Permalink
feat: [Ai] Add basic RAG capabilities, optimize Codegeex
Browse files Browse the repository at this point in the history
Log: as title
  • Loading branch information
LiHua000 authored and deepin-mozart committed Sep 26, 2024
1 parent 25ef0d1 commit a498471
Show file tree
Hide file tree
Showing 28 changed files with 62,249 additions and 45 deletions.
3 changes: 2 additions & 1 deletion assets/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# install templates.
install(DIRECTORY templates/
DESTINATION "${SOURCES_INSTALL_RPEFIX}/templates")
install(DIRECTORY models/
DESTINATION "${SOURCES_INSTALL_RPEFIX}/models")

find_package(Qt5 COMPONENTS LinguistTools)

Expand Down Expand Up @@ -45,4 +47,3 @@ file(GLOB ICON "${CMAKE_CURRENT_SOURCE_DIR}/configures/*.svg")
install(FILES ${SUPPORTFILES} DESTINATION "${SOURCES_INSTALL_RPEFIX}/configures")
install(FILES ${DESKTOPFILES} DESTINATION "/usr/share/applications")
install(FILES ${ICON} DESTINATION "${SOURCES_INSTALL_RPEFIX}/configures/icons")

1 change: 1 addition & 0 deletions assets/models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
all-MiniLM-L6-v2 is the sentence transformers model used with transformers.js to locally generate codebase embeddings.
44 changes: 44 additions & 0 deletions assets/models/all-MiniLM-L6-v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
library_name: "transformers.js"
---

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 with ONNX weights to be compatible with Transformers.js.

## Usage (Transformers.js)

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
```bash
npm i @xenova/transformers
```

You can then use the model to compute embeddings like this:

```js
import { pipeline } from '@xenova/transformers';

// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

// Compute sentence embeddings
const sentences = ['This is an example sentence', 'Each sentence is converted'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output);
// Tensor {
// dims: [ 2, 384 ],
// type: 'float32',
// data: Float32Array(768) [ 0.04592696577310562, 0.07328180968761444, ... ],
// size: 768
// }
```

You can convert this Tensor to a nested JavaScript array using `.tolist()`:
```js
console.log(output.tolist());
// [
// [ 0.04592696577310562, 0.07328180968761444, 0.05400655046105385, ... ],
// [ 0.08188057690858841, 0.10760223120450974, -0.013241755776107311, ... ]
// ]
```


Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
23 changes: 23 additions & 0 deletions assets/models/all-MiniLM-L6-v2/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"_name_or_path": "all-MiniLM-L6-v2",
"architectures": ["BertModel"],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 384,
"initializer_range": 0.02,
"intermediate_size": 1536,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 6,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.29.2",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
Binary file not shown.
7 changes: 7 additions & 0 deletions assets/models/all-MiniLM-L6-v2/special_tokens_map.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"cls_token": "[CLS]",
"mask_token": "[MASK]",
"pad_token": "[PAD]",
"sep_token": "[SEP]",
"unk_token": "[UNK]"
}
Loading

0 comments on commit a498471

Please sign in to comment.