-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix load bug #795
fix: Fix load bug #795
Conversation
Those test errors are not meaningful to me. I have not tried to run those tests locally but can do so this evening. LMK if you understand them and have any suggestions as to what I need to do to fix. Or are these known failure still not resolved from the big breaking changes? |
@cpacker do you want to to resolve merge conflicts and repush? |
@cpacker I found something that fixes the failing CI test. But when I run
And when I run Do you understand what might be causing this? I am totally new to |
I'm not getting the --extensions error:
Maybe it's an environment issue (env is pointing at a stale version of the file)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seems to me, that i am still facing this with version 0.3, where this fix should be in (checked my memgpt source code in both files) as far as i can understand. UPDATE: This error only occoured for me with your embedding-Server embeddings.memgpt.ai Description of the problem, that i am facing (hopefully anyone can reproduce that)
##5. Another try was converting the PDF to a textfile (with pdf2text.py)
#6. Even splitting the textfile into smaler ones with 20 pages per file didn't help. Even a 10 pages file didn't work for me, which is very weird to me! (you can find all those smaller files in my attachement)
|
Co-authored-by: Charles Packer <packercharles@gmail.com>
this same error showing its ugly head again in 0.3.4 |
thanks for the report @quantumalchemy - @sarahwooders and I are looking into it |
Co-authored-by: Charles Packer <packercharles@gmail.com>
Please describe the purpose of this pull request.
This is a partial fix for #755 for
local
embeddings. We might want to hold this until we can extend this fix to other embeddings endpoint types, but given that users are experiencing pain over this perhaps it is worth making it available now and fixing the others later.The change checks length of the token sequence and whenever the sequence exceeds the model's limit it truncates the text to exactly the model limit.
How to test
It may be worthwhile to first test without this change so that you can locate instances that exhibit the #755 bug.
Try
memgpt load directory --recursive --name <some-name> --input-dir <your-dir>
It's unclear to me which types of files are most problematic. I found it easiest to produce the problem with directories containing a lot of C++ code. But users have reported the problem with .pdf and I think .epub.
Once you have a directory that reproduces the bug, check out this branch and try again. However, you may also
need the new
--extensions <extension-list>
option. The default extensions currently configured in this branch is ".txt.,.md,.pdf". If you reproduced the bug with one of these file types then you don't need the option.I run the test with
--extensions .md,.hpp,.cpp
Have you tested this PR?
Yes. With the fix, the output should be similar to the output produced in the past when no files failed to load.
Related issues or PRs
#755
Is your PR over 500 lines of code?
No
Additional context
Add any other context or screenshots about the PR here.