Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-299] Add cost estimation to testset generator #1527

Closed
ahgraber opened this issue Oct 17, 2024 · 4 comments · Fixed by #1560
Closed

[R-299] Add cost estimation to testset generator #1527

ahgraber opened this issue Oct 17, 2024 · 4 comments · Fixed by #1560
Assignees
Labels
enhancement New feature or request module-testsetgen Module testset generation
Milestone

Comments

@ahgraber
Copy link
Contributor

ahgraber commented Oct 17, 2024

Describe the Feature
RAGAS supports cost estimation during evaluation; it would also be useful to know how much it costs to construct the synthetic dataset.

I would like to have the ability to use the CostCallbackHandler in both KnowledgeGraph creation and in TestsetGenerator.generate() calls. Specifically, it would be useful to track them separately and jointly:

  • if I create a KnowledgeGraph with apply_transforms(), I would like to know the token count / cost
  • if I call TestsetGenerator.generate() with that previously created KnowledgeGraph, I would like to know just the token count / cost of scenario generation and question synthesis
  • if I call TestsetGenerator.generate_with_langchain_docs(), I would like to know the token count / cost of both creating the KnowledgeGraph and scenario + question generation

Why is the feature important for you?
Cost management is critical especially if source material is dynamic and requires frequent dataset synthesis.

R-299

@ahgraber ahgraber added the enhancement New feature or request label Oct 17, 2024
@dosubot dosubot bot added the module-testsetgen Module testset generation label Oct 17, 2024
@Kefan-pauline
Copy link

+1

@jjmachan
Copy link
Member

that is a very good suggestion @ahgraber - will get this sorted for you, hopefully by next week 🙂

@jjmachan jjmachan added this to the v.25 milestone Oct 18, 2024
@jjmachan jjmachan changed the title Add cost estimation to testset generator [R-299] Add cost estimation to testset generator Oct 18, 2024
@jjmachan jjmachan modified the milestones: v.25, v.26 Oct 22, 2024
@jjmachan jjmachan self-assigned this Oct 22, 2024
@Kefan-pauline
Copy link

Kefan-pauline commented Nov 12, 2024

Thanks for adding this feature, however, it does not yet taken into account the cost of generating the KG used for testset generation, which is probably the most costly part.

@jjmachan
Copy link
Member

hey @Kefan-pauline that is true - just created #1671 to track this. Will get to this soon 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module-testsetgen Module testset generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants