Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow batching of items when sent to LLM #56

Closed
woodthom2 opened this issue Oct 24, 2024 · 6 comments
Closed

Allow batching of items when sent to LLM #56

woodthom2 opened this issue Oct 24, 2024 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@woodthom2
Copy link
Contributor

Description

Can we modify convert_texts_to_vector in https://github.com/harmonydata/harmony/blob/main/src/harmony/matching/default_matcher.py to allow items to be batched when sent to the LLM?

Batch size should be variable

Rationale

If a user wants to harmonise 10,000 items, this will not fit in memory even in a high performance machine. Small laptops probably can only batch 20 items at a time. But the batching should be configurable as it will slow things down. Perhaps as a parameter.

People have reported that the website cannot cope with large harmonisations. E.g. below comment on Discord (23 Oct 2024)

image

@woodthom2 woodthom2 added enhancement New feature or request good first issue Good for newcomers labels Oct 24, 2024
@makrianast
Copy link
Contributor

@woodthom2 Hello. If this issue is still open, i would love to work on that and contribute to your project.

@woodthom2
Copy link
Contributor Author

Hi @makrianast , please feel free to take this on! Thanks so much! Do you want to have a quick chat with me on Discord/Google Meet about it?

@woodthom2
Copy link
Contributor Author

Just FYI the server that is running the Harmony web tool is 16 GB. I have not tested to find out at what size a request crashes the server but I am pretty certain that the critical number is between 50 and 2000 questionnaire items! Of course we have to allow for different user machine specs

@makrianast
Copy link
Contributor

Hello @woodthom2 . Yes of course. My discord is: anastasiamakrii . Feel free to contact me there if you'd like!

@woodthom2
Copy link
Contributor Author

woodthom2 commented Nov 1, 2024 via email

@woodthom2
Copy link
Contributor Author

Also related to #63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants