Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add llm wenxinyiyan & config util & spo_triple_extract #27

Merged
merged 16 commits into from
Jan 24, 2024

Conversation

simon824
Copy link
Member

@simon824 simon824 commented Jan 18, 2024

  1. add llm wenxinyiyan
  2. add spo_triple_extract & CommitSPOToKg
  3. add config util
  4. code style format

spo triple extraction

  • input
"Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with"
        " since 2010. James, in his professional life, works as a journalist. Additionally, Sarah"
        " is the proud owner of the website www.sarahsplace.com, while James manages his own"
        " webpage, though the specific URL is not mentioned here. These two individuals, Sarah and"
        " James, have not only forged a strong personal bond as roommates but have also carved out"
        " their distinctive digital presence through their respective webpages, showcasing their"
        " varied interests and experiences."
  • output
image

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jan 18, 2024
@simon824 simon824 marked this pull request as draft January 18, 2024 10:40
@simon824 simon824 marked this pull request as ready for review January 19, 2024 01:58
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 19, 2024
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 22, 2024
@simon824 simon824 changed the title feat: add llm wenxinyiyan & config util feat: add llm wenxinyiyan & config util & spo_triple_extract Jan 22, 2024
@simon824
Copy link
Member Author

@imbajin @lzyxx77 @liuxiaocs7 PTAL, thanks!

Copy link
Member

@liuxiaocs7 liuxiaocs7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, left some minor comments

  1. wenxinyiyan could be replaced with ERNIE or ERNIE-Bot?
    because: https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Nlks5zkzu and more international (at this point CC: @imbajin)
  2. There are many automatic formatting modifications in this pr. could we add corresponding commands in the readme or other locations to stay code style unified.

hugegraph-llm/examples/build_kg_test.py Outdated Show resolved Hide resolved
hugegraph-llm/examples/graph_rag_test.py Outdated Show resolved Hide resolved
hugegraph-llm/examples/graph_rag_test.py Show resolved Hide resolved
hugegraph-llm/src/hugegraph_llm/llms/init_llm.py Outdated Show resolved Hide resolved
we need to manually fix all the warnings mentioned below before commit! "
export PYTHONPATH=${ROOT_DIR}/hugegraph-llm/src:${ROOT_DIR}/hugegraph-python-client/src
pylint --rcfile=${ROOT_DIR}/style/pylint.conf ${ROOT_DIR}/hugegraph-llm
#pylint --rcfile=${ROOT_DIR}/style/pylint.conf ${ROOT_DIR}/hugegraph-python-client
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix hugegraph-python-client code style in next pr

README.md Outdated Show resolved Hide resolved
hugegraph-llm/src/config/config.ini Outdated Show resolved Hide resolved
hugegraph-llm/src/hugegraph_llm/llms/ernie_bot.py Outdated Show resolved Hide resolved
hugegraph-llm/src/hugegraph_llm/utils/config.py Outdated Show resolved Hide resolved
hugegraph-llm/examples/graph_rag_test.py Show resolved Hide resolved
Copy link
Member

@liuxiaocs7 liuxiaocs7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +31 to +52
return """You are a data scientist working for a company that is building a graph database.
Your task is to extract information from data and convert it into a graph database. Provide a
set of Nodes in the form [ENTITY_ID, TYPE, PROPERTIES] and a set of relationships in the form
[ENTITY_ID_1, RELATIONSHIP, ENTITY_ID_2, PROPERTIES] and a set of NodesSchemas in the form [
ENTITY_TYPE, PRIMARY_KEY, PROPERTIES] and a set of RelationshipsSchemas in the form [
ENTITY_TYPE_1, RELATIONSHIP, ENTITY_TYPE_2, PROPERTIES] It is important that the ENTITY_ID_1
and ENTITY_ID_2 exists as nodes with a matching ENTITY_ID. If you can't pair a relationship
with a pair of nodes don't add it. When you find a node or relationship you want to add try
to create a generic TYPE for it that describes the entity you can also think of it as a label.

Here is an example The input you will be given: Data: Alice lawyer and is 25 years old and Bob
is her roommate since 2001. Bob works as a journalist. Alice owns a the webpage www.alice.com
and Bob owns the webpage www.bob.com. The output you need to provide: Nodes: ["Alice", "Person",
{"age": 25, "occupation": "lawyer", "name": "Alice"}], ["Bob", "Person", {"occupation":
"journalist", "name": "Bob"}], ["alice.com", "Webpage", {"name": "alice.com",
"url": "www.alice.com"}], ["bob.com", "Webpage", {"name": "bob.com", "url": "www.bob.com"}]
Relationships: [{"Person": "Alice"}, "roommate", {"Person": "Bob"}, {"start": 2021}],
[{"Person": "Alice"}, "owns", {"Webpage": "alice.com"}, {}], [{"Person": "Bob"}, "owns",
{"Webpage": "bob.com"}, {}] NodesSchemas: ["Person", "name", {"age": "int",
"name": "text", "occupation":
"text"}], ["Webpage", "name", {"name": "text", "url": "text"}] RelationshipsSchemas :["Person",
"roommate", "Person", {"start": "int"}], ["Person", "owns", "Webpage", {}]"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a better way to maintain the default prompt if we make subsequent modifications and adjustments

README.md Show resolved Hide resolved
imbajin
imbajin previously approved these changes Jan 24, 2024
Copy link
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

Copy link
Contributor

@Zony7 Zony7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imbajin imbajin merged commit 143e29f into apache:main Jan 24, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants