-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: embedder errors in embed length #9584
Conversation
Signed-off-by: Matt Krick <matt.krick@gmail.com>
Signed-off-by: Matt Krick <matt.krick@gmail.com>
Signed-off-by: Matt Krick <matt.krick@gmail.com>
Signed-off-by: Matt Krick <matt.krick@gmail.com>
Signed-off-by: Matt Krick <matt.krick@gmail.com>
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size. |
Signed-off-by: Matt Krick <matt.krick@gmail.com>
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size. |
Signed-off-by: Matt Krick <matt.krick@gmail.com>
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size. |
Description
embedder is still failing on large chunks of text because splitting is done using a heuristic.
now, after we split, we verify the chunk length & if it's still too big, we split again, but using smaller chunks.
embedder should be able to run even if env.AI_GENERATION_MODELS is undefined. Also cleaned up the validation of env vars so we get better error messages