fix: embedder errors in embed length #9584

mattkrick · 2024-04-01T18:28:39Z

Description

embedder is still failing on large chunks of text because splitting is done using a heuristic.
now, after we split, we verify the chunk length & if it's still too big, we split again, but using smaller chunks.

embedder should be able to run even if env.AI_GENERATION_MODELS is undefined. Also cleaned up the validation of env vars so we get better error messages

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions · 2024-04-01T20:02:43Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size.

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions · 2024-04-01T20:08:15Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size.

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions · 2024-04-01T20:31:33Z

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR will be delayed and might be rejected due to its size.

mattkrick added 3 commits March 28, 2024 17:53

fix: embedder doesn't dive deep into schema

c7e4b8e

Signed-off-by: Matt Krick <matt.krick@gmail.com>

fix: global noEmit

db63109

Signed-off-by: Matt Krick <matt.krick@gmail.com>

fix: verify chunk length before attempting to embed

9847db2

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions bot requested a review from tianrunhe April 1, 2024 18:28

github-actions bot added the size/s label Apr 1, 2024

feat: add linter to embedder

19c089c

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions bot added size/l and removed size/s labels Apr 1, 2024

mattkrick added 2 commits April 1, 2024 11:45

Merge branch 'master' into feat/embedder-errors

fe4efc1

fix: support embedder without generation env vars

2e16d93

Signed-off-by: Matt Krick <matt.krick@gmail.com>

github-actions bot added size/xl and removed size/l labels Apr 1, 2024

bump lockfile

c57d7ad

Signed-off-by: Matt Krick <matt.krick@gmail.com>

bump eslint

62e68a3

Signed-off-by: Matt Krick <matt.krick@gmail.com>

mattkrick removed the request for review from tianrunhe April 1, 2024 21:46

mattkrick merged commit 341b4b7 into master Apr 1, 2024
5 checks passed

mattkrick deleted the feat/embedder-errors branch April 1, 2024 21:46

parabol-release-bot bot mentioned this pull request Apr 1, 2024

chore(release): release v7.24.1 #9585

Merged

github-actions bot mentioned this pull request Apr 2, 2024

chore(release): Test v7.24.1 #9587

Merged

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: embedder errors in embed length #9584

fix: embedder errors in embed length #9584

mattkrick commented Apr 1, 2024 •

edited

Loading

github-actions bot commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

fix: embedder errors in embed length #9584

fix: embedder errors in embed length #9584

Conversation

mattkrick commented Apr 1, 2024 • edited Loading

Description

github-actions bot commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

mattkrick commented Apr 1, 2024 •

edited

Loading