Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external knowledge api #8913

Merged
merged 65 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
517cdb2
add external knowledge
JohnJyong Aug 20, 2024
4fd5792
Merge branch 'main' into feat/external-knowledge
JohnJyong Aug 20, 2024
f6c8390
external knowledge
JohnJyong Aug 20, 2024
e7762b7
external knowledge
JohnJyong Aug 20, 2024
067b956
merge migration
JohnJyong Aug 21, 2024
cb70e12
fix rerank mode is none
JohnJyong Aug 22, 2024
0724640
fix rerank mode is none
JohnJyong Aug 22, 2024
a63e150
update nltk version
JohnJyong Aug 23, 2024
e7c77d9
Merge branch 'main' into feat/external-knowledge
JohnJyong Sep 9, 2024
9ca0e56
external dataset binding
JohnJyong Sep 11, 2024
89e8187
merge error
JohnJyong Sep 13, 2024
9f894bb
external knowledge api
JohnJyong Sep 18, 2024
dcb033d
Merge branch 'main' into feat/external-knowledge
JohnJyong Sep 18, 2024
37f7d57
external knowledge api
JohnJyong Sep 18, 2024
19c5261
external knowledge api
JohnJyong Sep 19, 2024
fbedd08
feat: add external api
YIXIAO0 Sep 23, 2024
ed92c90
External knowledge api
JohnJyong Sep 24, 2024
089da06
External knowledge api
JohnJyong Sep 24, 2024
573b61b
External knowledge api
JohnJyong Sep 24, 2024
30dc137
Merge branch 'main' into feat/external-knowledge-api
JohnJyong Sep 24, 2024
2655dd2
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 24, 2024
6452c34
external knowledge api
JohnJyong Sep 24, 2024
b9b8ec1
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 24, 2024
680c1bd
remove description
JohnJyong Sep 24, 2024
a53b4fb
remove description
JohnJyong Sep 24, 2024
a258f8d
remove description
JohnJyong Sep 24, 2024
02b06c4
add external_retrieval_model
JohnJyong Sep 24, 2024
a69dcb8
add external_retrieval_model
JohnJyong Sep 25, 2024
c927c97
update to external knowledge api
JohnJyong Sep 25, 2024
d6c604a
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 25, 2024
5fa8607
update to external knowledge api
JohnJyong Sep 25, 2024
cfa4825
feat: external knowledge api crud frontend & connect external knowled…
YIXIAO0 Sep 25, 2024
85deb9d
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 25, 2024
ff0260e
fix: minor issues
YIXIAO0 Sep 26, 2024
611f0fb
update to external knowledge api
JohnJyong Sep 26, 2024
1c7cb3f
feat: external knowledge base
YIXIAO0 Sep 26, 2024
1597f34
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 27, 2024
5554cf7
feat: connect knowledge base to app
YIXIAO0 Sep 27, 2024
8e73844
update to external knowledge api
JohnJyong Sep 27, 2024
2a1cba9
Merge remote-tracking branch 'origin/feat/external-knowledge-api' int…
JohnJyong Sep 27, 2024
9c9352b
update to external knowledge api
JohnJyong Sep 27, 2024
c9e3a9e
feat: add external api from the create external knowledge page
YIXIAO0 Sep 27, 2024
020766a
Merge branch 'main' into feat/external-knowledge-api
JohnJyong Sep 27, 2024
644ab2d
feat: add new external knowledge api from the knowledge create page
YIXIAO0 Sep 27, 2024
b92fced
Merge branch 'main' into feat/external-knowledge-api
YIXIAO0 Sep 27, 2024
69c0f3f
fix: default selection issue & trigger retrieval setting unintentionally
YIXIAO0 Sep 28, 2024
e5d8c07
add helper text
YIXIAO0 Sep 29, 2024
4ee3743
add tidb on qdrant whitelist and batch job
JohnJyong Sep 29, 2024
1955de2
add tidb on qdrant whitelist and batch job
JohnJyong Sep 29, 2024
6508e7e
fix: retrieval config for rerank cases
YIXIAO0 Sep 29, 2024
8929018
add score threshold enabled
JohnJyong Sep 29, 2024
bc81d2d
fix: styling issues and create knowledge api from the knowledge base …
YIXIAO0 Sep 29, 2024
918df23
Merge branch 'feat/external-knowledge-api' of github.com:langgenius/d…
YIXIAO0 Sep 29, 2024
383a60a
fix: rerank open logics added to chatgpt, modified the hit detail mod…
YIXIAO0 Sep 29, 2024
fd4d7e9
fix: edit dataset card from datasets page, naming
YIXIAO0 Sep 30, 2024
f6074b6
fix: chatbot rerank popup logics
YIXIAO0 Sep 30, 2024
6f9d6cd
fix: edit external knowledge api warning message
YIXIAO0 Sep 30, 2024
77bfb9e
Merge branch 'main' into feat/external-knowledge-api
JohnJyong Sep 30, 2024
1644f1c
update tidb batch create
JohnJyong Sep 30, 2024
00617e5
add unstructured profiles
JohnJyong Sep 30, 2024
6f6869e
delete test external
JohnJyong Sep 30, 2024
1a03740
delete test external
JohnJyong Sep 30, 2024
653c208
update poetry.lock
JohnJyong Sep 30, 2024
89abc73
update poetry.lock
JohnJyong Sep 30, 2024
baffe2a
update poetry.lock
JohnJyong Sep 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions api/configs/middleware/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,5 +222,6 @@ class MiddlewareConfig(
TiDBVectorConfig,
WeaviateConfig,
ElasticsearchConfig,
BedrockConfig,
):
pass
11 changes: 10 additions & 1 deletion api/controllers/console/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,16 @@
from .billing import billing

# Import datasets controllers
from .datasets import data_source, datasets, datasets_document, datasets_segments, file, hit_testing, website
from .datasets import (
data_source,
datasets,
datasets_document,
datasets_segments,
external,
file,
hit_testing,
website,
)

# Import explore controllers
from .explore import (
Expand Down
54 changes: 52 additions & 2 deletions api/controllers/console/datasets/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,15 @@ def get(self):
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
ids = request.args.getlist("ids")
provider = request.args.get("provider", default="vendor")
# provider = request.args.get("provider", default="vendor")
search = request.args.get("keyword", default=None, type=str)
tag_ids = request.args.getlist("tag_ids")

if ids:
datasets, total = DatasetService.get_datasets_by_ids(ids, current_user.current_tenant_id)
else:
datasets, total = DatasetService.get_datasets(
page, limit, provider, current_user.current_tenant_id, current_user, search, tag_ids
page, limit, current_user.current_tenant_id, current_user, search, tag_ids
)

# check embedding setting
Expand Down Expand Up @@ -110,6 +110,26 @@ def post(self):
nullable=True,
help="Invalid indexing technique.",
)
parser.add_argument(
"external_knowledge_api_id",
type=str,
nullable=True,
required=False,
)
parser.add_argument(
"provider",
type=str,
nullable=True,
choices=Dataset.PROVIDER_LIST,
required=False,
default="vendor",
)
parser.add_argument(
"external_knowledge_id",
type=str,
nullable=True,
required=False,
)
args = parser.parse_args()

# The role of the current user in the ta table must be admin, owner, or editor, or dataset_operator
Expand All @@ -123,6 +143,9 @@ def post(self):
indexing_technique=args["indexing_technique"],
account=current_user,
permission=DatasetPermissionEnum.ONLY_ME,
provider=args["provider"],
external_knowledge_api_id=args["external_knowledge_api_id"],
external_knowledge_id=args["external_knowledge_id"],
)
except services.errors.dataset.DatasetNameDuplicateError:
raise DatasetNameDuplicateError()
Expand Down Expand Up @@ -211,6 +234,33 @@ def patch(self, dataset_id):
)
parser.add_argument("retrieval_model", type=dict, location="json", help="Invalid retrieval model.")
parser.add_argument("partial_member_list", type=list, location="json", help="Invalid parent user list.")

parser.add_argument(
"external_retrieval_model",
type=dict,
required=False,
nullable=True,
location="json",
help="Invalid external retrieval model.",
)

parser.add_argument(
"external_knowledge_id",
type=str,
required=False,
nullable=True,
location="json",
help="Invalid external knowledge id.",
)

parser.add_argument(
"external_knowledge_api_id",
type=str,
required=False,
nullable=True,
location="json",
help="Invalid external knowledge api id.",
)
args = parser.parse_args()
data = request.get_json()

Expand Down
240 changes: 240 additions & 0 deletions api/controllers/console/datasets/external.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
from flask import request
from flask_login import current_user
from flask_restful import Resource, marshal, reqparse
from werkzeug.exceptions import Forbidden, InternalServerError, NotFound

import services
from controllers.console import api
from controllers.console.app.error import ProviderNotInitializeError
from controllers.console.datasets.error import DatasetNameDuplicateError
from controllers.console.setup import setup_required
from controllers.console.wraps import account_initialization_required
from fields.dataset_fields import dataset_detail_fields
from libs.login import login_required
from services.dataset_service import DatasetService
from services.external_knowledge_service import ExternalDatasetService
from services.hit_testing_service import HitTestingService


def _validate_name(name):
if not name or len(name) < 1 or len(name) > 100:
raise ValueError("Name must be between 1 to 100 characters.")
return name


def _validate_description_length(description):
if description and len(description) > 400:
raise ValueError("Description cannot exceed 400 characters.")
return description


class ExternalApiTemplateListApi(Resource):
@setup_required
@login_required
@account_initialization_required
def get(self):
page = request.args.get("page", default=1, type=int)
limit = request.args.get("limit", default=20, type=int)
search = request.args.get("keyword", default=None, type=str)

external_knowledge_apis, total = ExternalDatasetService.get_external_knowledge_apis(
page, limit, current_user.current_tenant_id, search
)
response = {
"data": [item.to_dict() for item in external_knowledge_apis],
"has_more": len(external_knowledge_apis) == limit,
"limit": limit,
"total": total,
"page": page,
}
return response, 200

@setup_required
@login_required
@account_initialization_required
def post(self):
parser = reqparse.RequestParser()
parser.add_argument(
"name",
nullable=False,
required=True,
help="Name is required. Name must be between 1 to 100 characters.",
type=_validate_name,
)
parser.add_argument(
"settings",
type=dict,
location="json",
nullable=False,
required=True,
)
args = parser.parse_args()

ExternalDatasetService.validate_api_list(args["settings"])

# The role of the current user in the ta table must be admin, owner, or editor, or dataset_operator
if not current_user.is_dataset_editor:
raise Forbidden()

try:
external_knowledge_api = ExternalDatasetService.create_external_knowledge_api(
tenant_id=current_user.current_tenant_id, user_id=current_user.id, args=args
)
except services.errors.dataset.DatasetNameDuplicateError:
raise DatasetNameDuplicateError()

return external_knowledge_api.to_dict(), 201


class ExternalApiTemplateApi(Resource):
@setup_required
@login_required
@account_initialization_required
def get(self, external_knowledge_api_id):
external_knowledge_api_id = str(external_knowledge_api_id)
external_knowledge_api = ExternalDatasetService.get_external_knowledge_api(external_knowledge_api_id)
if external_knowledge_api is None:
raise NotFound("API template not found.")

return external_knowledge_api.to_dict(), 200

@setup_required
@login_required
@account_initialization_required
def patch(self, external_knowledge_api_id):
external_knowledge_api_id = str(external_knowledge_api_id)

parser = reqparse.RequestParser()
parser.add_argument(
"name",
nullable=False,
required=True,
help="type is required. Name must be between 1 to 100 characters.",
type=_validate_name,
)
parser.add_argument(
"settings",
type=dict,
location="json",
nullable=False,
required=True,
)
args = parser.parse_args()
ExternalDatasetService.validate_api_list(args["settings"])

external_knowledge_api = ExternalDatasetService.update_external_knowledge_api(
tenant_id=current_user.current_tenant_id,
user_id=current_user.id,
external_knowledge_api_id=external_knowledge_api_id,
args=args,
)

return external_knowledge_api.to_dict(), 200

@setup_required
@login_required
@account_initialization_required
def delete(self, external_knowledge_api_id):
external_knowledge_api_id = str(external_knowledge_api_id)

# The role of the current user in the ta table must be admin, owner, or editor
if not current_user.is_editor or current_user.is_dataset_operator:
raise Forbidden()

ExternalDatasetService.delete_external_knowledge_api(current_user.current_tenant_id, external_knowledge_api_id)
return {"result": "success"}, 200


class ExternalApiUseCheckApi(Resource):
@setup_required
@login_required
@account_initialization_required
def get(self, external_knowledge_api_id):
external_knowledge_api_id = str(external_knowledge_api_id)

external_knowledge_api_is_using, count = ExternalDatasetService.external_knowledge_api_use_check(
external_knowledge_api_id
)
return {"is_using": external_knowledge_api_is_using, "count": count}, 200


class ExternalDatasetCreateApi(Resource):
@setup_required
@login_required
@account_initialization_required
def post(self):
# The role of the current user in the ta table must be admin, owner, or editor
if not current_user.is_editor:
raise Forbidden()

parser = reqparse.RequestParser()
parser.add_argument("external_knowledge_api_id", type=str, required=True, nullable=False, location="json")
parser.add_argument("external_knowledge_id", type=str, required=True, nullable=False, location="json")
parser.add_argument(
"name",
nullable=False,
required=True,
help="name is required. Name must be between 1 to 100 characters.",
type=_validate_name,
)
parser.add_argument("description", type=str, required=False, nullable=True, location="json")
parser.add_argument("external_retrieval_model", type=dict, required=False, location="json")

args = parser.parse_args()

# The role of the current user in the ta table must be admin, owner, or editor, or dataset_operator
if not current_user.is_dataset_editor:
raise Forbidden()

try:
dataset = ExternalDatasetService.create_external_dataset(
tenant_id=current_user.current_tenant_id,
user_id=current_user.id,
args=args,
)
except services.errors.dataset.DatasetNameDuplicateError:
raise DatasetNameDuplicateError()

return marshal(dataset, dataset_detail_fields), 201


class ExternalKnowledgeHitTestingApi(Resource):
@setup_required
@login_required
@account_initialization_required
def post(self, dataset_id):
dataset_id_str = str(dataset_id)
dataset = DatasetService.get_dataset(dataset_id_str)
if dataset is None:
raise NotFound("Dataset not found.")

try:
DatasetService.check_dataset_permission(dataset, current_user)
except services.errors.account.NoPermissionError as e:
raise Forbidden(str(e))

parser = reqparse.RequestParser()
parser.add_argument("query", type=str, location="json")
parser.add_argument("external_retrieval_model", type=dict, required=False, location="json")
args = parser.parse_args()

HitTestingService.hit_testing_args_check(args)

try:
response = HitTestingService.external_retrieve(
dataset=dataset,
query=args["query"],
account=current_user,
external_retrieval_model=args["external_retrieval_model"],
)

return response
except Exception as e:
raise InternalServerError(str(e))


api.add_resource(ExternalKnowledgeHitTestingApi, "/datasets/<uuid:dataset_id>/external-hit-testing")
api.add_resource(ExternalDatasetCreateApi, "/datasets/external")
api.add_resource(ExternalApiTemplateListApi, "/datasets/external-knowledge-api")
api.add_resource(ExternalApiTemplateApi, "/datasets/external-knowledge-api/<uuid:external_knowledge_api_id>")
api.add_resource(ExternalApiUseCheckApi, "/datasets/external-knowledge-api/<uuid:external_knowledge_api_id>/use-check")
2 changes: 2 additions & 0 deletions api/controllers/console/datasets/hit_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def post(self, dataset_id):
parser = reqparse.RequestParser()
parser.add_argument("query", type=str, location="json")
parser.add_argument("retrieval_model", type=dict, required=False, location="json")
parser.add_argument("external_retrieval_model", type=dict, required=False, location="json")
args = parser.parse_args()

HitTestingService.hit_testing_args_check(args)
Expand All @@ -57,6 +58,7 @@ def post(self, dataset_id):
query=args["query"],
account=current_user,
retrieval_model=args["retrieval_model"],
external_retrieval_model=args["external_retrieval_model"],
limit=10,
)

Expand Down
Loading
Loading