Introduced a simple demo of mixed index and text2gremlin demonstratio…

…n. (#54) * 1. 修改配置方式，整体配置简单易用。 2. Graph RAG增加向量索引和混合索引加重排的两种方式的索引。 3. 增加自然语言转gremlin的任务及演示。 * reafact: add conn check after applying configs (WIP) TODO: lack some check now * fix: use log to print some info * feat(llm): support qianfan platform for embedding & replace wenxin refer * 1. 修复了导入和搜索的bug * fix upload document * fix the non-disambiguate uploading * chore: exclude binary files & update params order * fix: openai params not match * log: record error in py-client with orange color for debug RESTful API problem * 1. added triples and edges id setting 2. added template input textbox 3. changed ernie llm to use qianfan sdk 4. added function of filtering id length * refact: update the prompt template Also rename the same params (inner & outer) TODO: Non-Schema mode will throw Error now (we need fix it) * update: add config file argument and update to build semantic vertex-id querying - Update BuildSemanticIndex class to match new file naming convention. - Modify code_format_and_analysis.sh to use a line length of 120. - Change logging format to use %s instead of {} for string formatting. * feat: optimize index loading and add clean-up button 1. Introduce a clean-up function to remove index and content files, aiding in maintaining a clean file system after operations. 2. Introduce default demo of build kg. 3. Now it does not clean the kg before building kg by default. 4. Clean the stopwords files. * Add generating operation of config file in README.md * fix: use lower case to compare str & enhance the extract prompt * fix: Fix to add triples from extracted vertex, and fix the format of triples. * doc: update readme * feat: use custom_handler log color * fix: avoid FSRF security warning * doc: update readme * feat: change config store method * chore: use a flexible version for dependencies * feat(hugegraph_llm): supporting property graph extract and primary key id. (#2) * fix: token type should be int (WIP) init graphspace support * fix: read int parameter from .env * feat: use .env as default config file * fix: Add a 'copy' button to the output box * Support local openai env (#1) * support local openai env * fix env * refact: set gpt-4o-mini as the openai default type * feat: Add rag option for four output type including "llm-raw", "graph-only", "vector-only", "graph-vector" * feat: Add vector result output in cmd. * feat: Add error catch and display it on the front-end web interface, fix code style (#3) * fix: fix the triple extract * 修改格式，和属性图模式 * 修复代码样式，并增加错误抓取和传输到前端界面 * revert: graphspace init for CI --------- Co-authored-by: imbajin <jin@apache.org> Co-authored-by: chenzihong <522023320011@smail.nju.edu.cn> Co-authored-by: Liu Jiajun <85552719+jasinliu@users.noreply.github.com>
apache · Jul 23, 2024 · 3a3698b · 3a3698b
1 parent a805bb2
commit 3a3698b
Show file tree

Hide file tree

Showing 81 changed files with 4,148 additions and 1,030 deletions.
diff --git a/.github/workflows/pylint.yml b/.github/workflows/pylint.yml
@@ -22,8 +22,7 @@ jobs:
       run: |
         python -m pip install --upgrade pip
         pip install pylint pytest
-        pip install -r ./hugegraph-llm/requirements.txt 
-        pip install -r ./hugegraph-llm/llm_api/requirements.txt
+        pip install -r ./hugegraph-llm/requirements.txt
         pip install -r ./hugegraph-python-client/requirements.txt
     - name: Analysing the code with pylint
       run: |

diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,9 @@
+# User-specific files
+**/logs/*.log*
+*.faiss
+*.pkl
+out/production/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/.licenserc.yaml b/.licenserc.yaml
@@ -80,6 +80,7 @@ header: # `header` section is configurations for source codes license header.
     - '**/*.ipr'
     - '**/META-INF/MANIFEST.MF'
     - '.repository/**'
+    - '**/resources/**'
 
   comment: on-failure
   # on what condition license-eye will comment on the pull request, `on-failure`, `always`, `never`.

diff --git a/hugegraph-llm/MANIFEST.in b/hugegraph-llm/MANIFEST.in
@@ -15,4 +15,4 @@
 # specific language governing permissions and limitations
 # under the License.
 
-recursive-include src/hugegraph_llm/config *
+recursive-include src/hugegraph_llm/resources *
diff --git a/hugegraph-llm/README.md b/hugegraph-llm/README.md
@@ -17,46 +17,54 @@ graph systems and large language models.
 
 ## Environment Requirements
 
-- python 3.8+ 
+- python 3.9+ 
 - hugegraph 1.0.0+
 
 ## Preparation
 
-- Start the HugeGraph database, you can do it via Docker. Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance
-- Start the gradio interactive demo, you can start with the following command, and open http://127.0.0.1:8001 after starting
+- Start the HugeGraph database, you can do it via Docker/[Binary packages](https://hugegraph.apache.org/docs/download/download/). 
+Refer to [docker-link](https://hub.docker.com/r/hugegraph/hugegraph) & [deploy-doc](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev) for guidance
+- Clone this project
     ```bash
-    # 0. clone the hugegraph-ai project & enter the root dir
-    # 1. configure the environment path
-    PROJECT_ROOT_DIR = "/path/to/hugegraph-ai" # root directory of hugegraph-ai
-    export PYTHONPATH=${PROJECT_ROOT_DIR}/hugegraph-llm/src:${PROJECT_ROOT_DIR}/hugegraph-python-client/src
-
-    # 2. install the required packages/deps (better to use virtualenv(venv) to manage the environment)
-    cd hugegraph-llm 
-    pip install -r requirements.txt # ensure the python/pip version is satisfied
-    # 2.1 set basic configs in the hugegraph-llm/config/config.ini (Optional, you can also set it in gradio) 
-
-    # 3. start the gradio server, wait for some time to initialize 
-    python3 ./src/hugegraph_llm/utils/gradio_demo.py
-   ```
-- Configure HugeGraph database connection information & LLM information in the gradio interface, 
-  click on `Initialize configs`, the complete and initialized configuration file will be overwritten.
-- offline download NLTK stopwords
+    git clone https://github.com/apache/incubator-hugegraph-ai.git
+    ```
+- Install [hugegraph-python-client](../hugegraph-python-client) and [hugegraph_llm](src/hugegraph_llm)
+    ```bash
+    cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) 
+    pip install ./hugegraph-python-client
+    pip install -r ./hugegraph-llm/requirements.txt
+    ```
+- Enter the project directory
+    ```bash
+    cd ./hugegraph-llm/src
+    ```
+- Generate the config file
     ```bash
-    python3 ./src/hugegraph_llm/operators/common_op/nltk_helper.py
+    python3 -m hugegraph_llm.config.generate
     ```
+- Start the gradio interactive demo of **Graph RAG**, you can start with the following command, and open http://127.0.0.1:8001 after starting
+    ```bash
+    python3 -m hugegraph_llm.demo.rag_web_demo
+   ```
+
+- Or start the gradio interactive demo of **Text2Gremlin**, you can start with the following command, and open http://127.0.0.1:8002 after starting
+    ```bash
+    python3 -m hugegraph_llm.demo.gremlin_generate_web_demo
+   ```
 
 ## Examples
 
 ### 1.Build a knowledge graph in HugeGraph through LLM
 
-Run example like `python3 ./hugegraph-llm/examples/build_kg_test.py`
+Run example like `python3 ./hugegraph_llm/examples/build_kg_test.py`
 
 The `KgBuilder` class is used to construct a knowledge graph. Here is a brief usage guide:
 
-1. **Initialization**: The `KgBuilder` class is initialized with an instance of a language model. This can be obtained from the `LLMs` class.
+1. **Initialization**: The `KgBuilder` class is initialized with an instance of a language model. 
+This can be obtained from the `LLMs` class.
 
     ```python
-    from hugegraph_llm.llms.init_llm import LLMs
+    from hugegraph_llm.models.llms.init_llm import LLMs
     from hugegraph_llm.operators.kg_construction_task import KgBuilder
     
     TEXT = ""
@@ -111,7 +119,7 @@ The methods of the `KgBuilder` class can be chained together to perform a sequen
 
 ### 2. Retrieval augmented generation (RAG) based on HugeGraph
 
-Run example like `python3 ./hugegraph-llm/examples/graph_rag_test.py`
+Run example like `python3 ./hugegraph_llm/examples/graph_rag_test.py`
 
 The `GraphRAG` class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities.
 Here is a brief usage guide:

diff --git a/hugegraph-llm/llm_api/README.md b/hugegraph-llm/llm_api/README.md
diff --git a/hugegraph-llm/llm_api/main.py b/hugegraph-llm/llm_api/main.py
diff --git a/hugegraph-llm/llm_api/requirements.txt b/hugegraph-llm/llm_api/requirements.txt
diff --git a/hugegraph-llm/requirements.txt b/hugegraph-llm/requirements.txt
@@ -1,5 +1,13 @@
-openai==0.28.1
-retry==0.9.2
-tiktoken==0.7.0
-nltk==3.8.1
-gradio==4.37.2
+openai~=0.28.1
+ollama~=0.2.1
+qianfan~=0.3.18
+retry~=0.9.2
+tiktoken>=0.7.0
+nltk~=3.8.1
+gradio>=4.37.2
+jieba>=0.42.1
+numpy~=1.24.4
+python-docx~=1.1.2
+langchain-text-splitters~=0.2.2
+faiss-cpu~=1.8.0
+python-dotenv>=1.0.1
diff --git a/hugegraph-llm/src/hugegraph_llm/config/__init__.py b/hugegraph-llm/src/hugegraph_llm/config/__init__.py
@@ -14,3 +14,19 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+
+
+__all__ = [
+    "settings",
+    "resource_path"
+]
+
+import os
+from .config import Config
+
+
+settings = Config()
+settings.from_env()
+
+package_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+resource_path = os.path.join(package_path, "resources")