-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High token usage due to embeddings file failing to save (Empty embeddings-2.json
)
#32
Comments
How many notes are in your vault? Depending on your vault and usage, the first run should be 90—99% of the cost. And that first run could, theoretically, take a few days, depending on the size of your vault and rate limits. I spend a lot of energy doing everything possible to ensure that tokens aren't wasted. So once your vault is fully embedded, your cost should be reduced to something like pennies per day. Of course, an edge case could always exist based on how you operate your vault. So if your vault size seems too small for the cost, I'd be happy to help you explore what might be happening. Particularly if another plugin constantly alters the contents of notes and thus triggers re-embedding. But if the number of notes in your vault is >10K or you have a significant number of really large notes with lots of headings, then the usage cost should reduce exponentially in the coming days. Please let me know what you think about that and if it applies to you. And thanks for the feedback @vratclarkson Edit: Here's a comment referencing expected typical cost #31 (comment) |
$10 is a bit excessive. Of course I don't have an encyclopedia stored in my vault. |
For reference, it cost me ~$0.75 to re-embed 1,500 notes. |
Though, it may be worth implementing an in-house rate limiter. The rate limiters on OpenAI aren't exactly intuitive. |
I have about 1.5k notes. Problem could be that I reindex the notes two times. I have disabled few plugins based on your suggestion. I will let you know if I get charged more. Thank you for your time. Really appreciate it. |
I think I may have discovered a possible cause. Embeddings file doesn't sync with Obsidian Sync. Not sure what the cause is, but I just noticed that on my dev chromebook that obsidian isn't syncing my embeddings file everytime I cycle it through a powerwash. In all fairness, Obsidian Sync has been throwing away random files when syncing for god knows what reasons |
@smartguy1196 good catch, I haven't done much thinking re:syncing because I only have one desktop device, and the Smart Connections plugin is desktop-only. Keep me posted if you have any thoughts on addressing this. Additionally, for anyone who doesn't think the syncing is the problem, there are settings to log more information to the console log, including which files were embedded each passthrough. This might help narrow down the cause in the case of continuous embedding. Thanks for the update! |
The version that isn't pulling from sync that should be is the linux one on the chromebook. I tend to not use plug-ins on the android version on the Chromebook |
@smartguy1196 this issue #20 (comment) also referred to file problems with Linux. I've considered using IndexedDB instead of a file for cold storage of the embeddings. If this continues to be a problem, I might have to add some priority. However, I don't know if IndexedDB data is synced between instances. So I will have to investigate further. And thanks for keeping me updated with that additional information! |
Yes, Brian, I am having the same problem! It is very costly to run this plug-in. I am disabling this for the time being. it works great but the cost..phew! |
Hey @nigelthomp Thanks for following up on this issue. I know it's frustrating when the software isn't working as expected. I appreciate you going out of your way to provide feedback that might help solve the issue. And clearly, we're encountering some bugs here because, besides the initial embedding of your entire vault, which is a factor of vault size, the recurring cost should be almost negligible. When you check your Do you have any plugins that may continually update large amounts of notes? Unfortunately, this may also be triggering unnecessary re-embeddings. And depending on the use of those files, the remedy may be as easy as excluding them via the Smart Connections settings. Additionally, in the settings, you can toggle on additional console logs, including details on the number of tokens being used and which notes are being processed. Keeping an eye on this may hint at which notes are eating up the tokens. Thanks for your help in solving this! |
Hi Brian, Thanks for getting back to me with your suggestions. I will look at this and report back. I have a lot of plugins, so I might start by turning off the ones I don't use a lot to see if things improve. Unfortunately, my OpenAI bill this month is a lot so i will have to wait for the time being. So it would be great to get feedback from other users in the meantime. BTW, I think your plugin is very exciting, and I can see lots of potential in this area of Ai and notes taking, thank you so much for all your work!! |
FTR, I still suspect problems with sync. @brianpetro what is the thought-process behind not storing the embeddings files inside of the plugins folder? |
You know what I think it could be? Perhaps the load order from sync. If smart-connections is somehow loading before the embeddings file gets synced over, the plugin might think there isn't one and might start automatically creating a new embeddings file despite a pending sync A fix for this might be to check if there's a pending sync for the embeddings file and wait for it to download prior to starting the plugin. |
That actually makes a lot of sense, because if the embeddings file is larger than the plugin, it would take longer to download it than the plugin. |
@smartguy1196 thanks for the thought about sync order. I mentioned in #36 (comment) which focuses on the syncing issue. Folder choice was pretty arbitrary since this was my first Obsidian plugin and I'm still unsure whether it makes sense to store hundreds of megabytes to gigabytes worth of Embeddings in the plugins folder. Other than that, I did want people to know they had access to their Embeddings. Good question though. Do you know of any plugins with similar storage requirements that use the plugins folder to store it? I am exploring storage options right now so it would be interesting to investigate. |
TBH, no. I'm working on getting my first plugin working as well (obsidian-selenium). You may have to unpack obsidian's asar and look at the source code for the internal sync plugin to find out how it works. |
This is an amazing extension. I ran it on a fairly large vault that cost $25, and it worked brilliantly. However, I ran into the same issue some of the other folks have. The Embeddings and -2 files are 1KB. I'm syncing Obsidian between a Windows 10 and MacOS machine using Obsidian Sync, if that helps. |
@harpreetchima in order to minimize your OpenAI API costs, it might be best to disable syncing and maintain two separate Embeddings files. Let me know if that enables your If the embeddings aren't being saved to that file, then your usage costs will quickly add up. Thanks for your feedback and help solving this issue! |
Hmmm... I wonder if you could solve the sync issues with git? Perhaps set up the smart-connections folder as a git repository and anytime the embeddings file gets inexplicably deleted have the plugin run a git-revert? Perhaps integrate git-reversion into the same part of the script that detects no embeddings file? |
The only issue I see is if obsidian-sync inexplicably deletes the git files. |
On a side note, I may have found the issue: I think that there may be a bug in how Obsidian Sync handles deleted/non-existent files. Specifically, I think there might be a bug that occurs when a vault connects to the sync after opening. I've noticed that most of the files that get deleted by Obsidian Sync happen when I open the vault on a machine without certain files. For some reason, Obsidian Sync thinks that the just opened vault is the most up-to-date one, and deletes the embeddings file and anything else that is absent inside the local vault from the sync's vault |
I would open a bug report, but I haven't recorded it happening. One of the 2 of us may have to write a unit test to expose the bug. I'm thinking maybe write a unit test that does this:
My hands are full with the Obsidian-Selenium plugin and School (EDIT: and Miraclecast and Chromebrew and Work - I have too much to do...) at the moment. ANOTHER EDIT: feel free to use any of my source code :) |
@smartguy1196 GitHub push starts having issues when the file reaches 100mb, so I'm not sure if that could be used for syncing, unless you're suggesting to only use a local git instance and still rely on Obsidian sync. Has anyone with this bug confirmed whether the file gets written at all? Modified time might not be reliable if it is a sync issue. But maybe someone could observe the file being present, with a size greater than 1kb, then at some point reverting back to 1kb. If we can confirm the file is being written but at some point is overwritten, then we can follow up with Obsidian Sync to see if this is a big they may be able to fix on their end. |
I was thinking of using a local(-ish) instance of git that gets stored in the vault (therefore on Obsidian-Sync) as a checksum with the ability to restore the missing file. If Obsidian successfully syncs the git data, but omits the embeddings file, git can detect this.
If I have freetime between all of my projects, I can help write a test |
Same problem here:
|
Hey @RobinLandy I'm still trying to narrow down the cause of this. Which OS are you using? And thanks for your report! |
Hey @brianpetro Thanks for working on this. I'm using MacOS 13.2. Let me know if I can provide any other details that might help. |
If it helps, I also sailing the same boat.
|
@vratclarkson thanks for the info! Another question, were you syncing between multiple desktop devices? |
No. One desktop device only. |
@RobinLandy, thanks for the update. cc: @vratclarkson Please check out the latest version Thanks for all your help! |
embeddings-2.json
)
Hi @brianpetro And thank you for this plugin. |
In 1.2.1 the "Making smart connections" counter went over 500, but the embedding-2.json is still 2 bytes and the date modified is still 27 Feb. |
@vratclarkson thanks! @RobinLandy interesting! If you rename the test file to And thanks for letting me know about the counter. It's not indicative of anything specific at this time. |
@RobinLandy, as long as the embeddings aren't being saved, the plugin will re-embed your entire vault every time the plugin or Obsidian is restarted. |
Makes sense. I've disabled the plugin, and will await the next update. |
@RobinLandy, I added a "Manual Save" button in the settings in version This will try to write to the embeddings-2 file and should return an error if there is any issue. |
@smartguy1196 @harpreetchima @nigelthomp @vratclarkson I believe this is now fixed as of version There was a logical error, so, unfortunately, it probably should have been fixed sooner 🤦♂️ Thank you to everyone who helped me get this figured out! |
Thanks for this plugin, but it is economically not feasible to use if it happens to use so many tokens.
Any suggestions?
The text was updated successfully, but these errors were encountered: