Use compression when saving knowledge graphs #1916

lryan599 · 2025-02-11T11:21:53Z

Describe the Feature
I found that ragas directly dumps node and relationships directly when saving the knowledge graph, and the relationships holds all the information about the relevant node, with a very large amount of redundancy, especially in the summary_embedding field. So when saving relationships, wouldn't it be better to just save the ID of the related node instead?

Also, there are some compression methods that can be used when saving embeddings. For instances, some RAG pipelines use base64 encoding to save these embeddings.

Why is the feature important for you?
Generating test sets takes a lot of time, and it is necessary to save the knowledge graphs generated in between to be able to load the knowledge graphs directly in case of an anomaly instead of starting all over again.

The text was updated successfully, but these errors were encountered:

lryan599 · 2025-02-14T11:28:15Z

ragas/src/ragas/testset/graph.py

Line 88 in cb63a82

class Relationship(BaseModel):

It would be fine to add this code to class Relationship

    @field_serializer("source", "target")
    def serialize_node(self, node: Node):
        return node.id

lryan599 added the enhancement New feature or request label Feb 11, 2025

sahusiddharth added the module-testsetgen Module testset generation label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use compression when saving knowledge graphs #1916

Use compression when saving knowledge graphs #1916

lryan599 commented Feb 11, 2025

lryan599 commented Feb 14, 2025

Use compression when saving knowledge graphs #1916

Use compression when saving knowledge graphs #1916

Comments

lryan599 commented Feb 11, 2025

lryan599 commented Feb 14, 2025