Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV Limits #121

Open
0x4007 opened this issue Sep 30, 2024 · 15 comments
Open

KV Limits #121

0x4007 opened this issue Sep 30, 2024 · 15 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 30, 2024

  • We hit 90% daily quota today. Given that we don't even have any partners using the system yet, this is looking grim.
  • I hope that we can find a way to optimize our KV usage (or we can figure some cheaper alternative, but I am skeptical.)
  • image
  • image

Projected Costs

  • The good news is that our limits go from 1000 -> 33,333, so we get essentially 33x capacity for $5/month.
  • I estimate that once we get most of the planned plugins up and running, we'll be closer to 5k a day.
  • 6,666.6666666667 per $1 of cost
  • I think each large partner will cost us approximately $1 a month on KV
    • Smaller ones will probably be closer to the 1k we are using now.

Next Steps

Let's discuss how we can optimize the KV usage of the kernel.

@0x4007
Copy link
Member Author

0x4007 commented Sep 30, 2024

@gentlementlegen @whilefoo rfc

@whilefoo
Copy link
Contributor

First we need to figure out what's using the KV so much. Can you see in the dashboard if it's kernel or some other plugin?

@0x4007
Copy link
Member Author

0x4007 commented Sep 30, 2024

Unfortunately, I can't find any useful information from the analytics. I included in the screenshots everything relevant I could find.

@gentlementlegen
Copy link
Member

Lately we added https://github.com/ubiquity-os-marketplace/generate-vector-embeddings that reacts to 7 different events. Each run of a plugin equals one KV put. The more plugins and events we listen to, the more the usage will increase.

@gentlementlegen
Copy link
Member

I think the base of the problem is that we need to persist data between runs since the worker gets destroyed after its job is done, which is why we used KV in the first place. So the alternatives to store the data we could maybe consider:

@0x4007
Copy link
Member Author

0x4007 commented Oct 1, 2024

I got the 90% warning again today and today just started. Yes I have a feeling it must be the vector embeddings plugin. Perhaps we need to optimize it.

@sshivaditya2019 as a heads up, let us know if you have ideas for optimizing your plugin. Cloudflare KV is used to manage state across "plugin chain" runs. Basically in our config when we define multiple plugins to be invoked by a specific webhook (such as issues_comment.created) the kernel keeps track of what comes next per what job using KV. It seems that this plugin is using more than all of our plugins combined, and by 2-3x.


Is it realistic to check if it's the last event in the plugin chain and not keep track of it anymore? That way we can put the heavy ones at the end, like vector embeddings?

For example, issue comment created and we have three plugins. Kernel executes first and second normally, but it knows the last one is next so it executes and does not read/write KV. If it executes it I don't see why it needs to keep track anymore.

@sshivaditya2019
Copy link

sshivaditya2019 commented Oct 1, 2024

I got the 90% warning again today and today just started. Yes I have a feeling it must be the vector embeddings plugin. Perhaps we need to optimize it.

@sshivaditya2019 as a heads up, let us know if you have ideas for optimizing your plugin. Cloudflare KV is used to manage state across "plugin chain" runs. Basically in our config when we define multiple plugins to be invoked by a specific webhook (such as issues_comment.created) the kernel keeps track of what comes next per what job using KV. It seems that this plugin is using more than all of our plugins combined, and by 2-3x.

Is it realistic to check if it's the last event in the plugin chain and not keep track of it anymore? That way we can put the heavy ones at the end, like vector embeddings?

For example, issue comment created and we have three plugins. Kernel executes first and second normally, but it knows the last one is next so it executes and does not read/write KV. If it executes it I don't see why it needs to keep track anymore.

A straightforward way to optimize would be to divide this functionality into several plugins. For instance, "Issue Matching" could function as one action plugin, while "Issue Deduplication" could be another. We could further enhance efficiency by implementing batch processing for comments, rather than triggering actions every time a comment is edited or deleted.

Alternatively, we could maintain the current setup but use a Postgres connection URI instead of the Supabase key and URI. We could also implement the embedding generation as a Postgres function. We save close to 14 KV operations per invocation.

Another alternative would be to limit access to the anonymous key (Clear Text) as much as possible (RLS with Policy) and instead pass JWT tokens from the kernel. These tokens would then be used by the worker to make calls to the Supabase REST API.

@0x4007
Copy link
Member Author

0x4007 commented Oct 1, 2024

A straightforward way to optimize would be to divide this functionality into several plugins. For instance, "Issue Matching" could function as one action plugin, while "Issue Deduplication" could be another. We could further enhance efficiency by implementing batch processing for comments, rather than triggering actions every time a comment is edited or deleted.

Anything on a timer is a no-go. Can you make batch processing events based?

How does breaking it apart into separate plugins help with this? Also won't it cause a lot of code duplication? I always prefer breaking apart plugins wherever possible for enhanced modularity.

Alternatively, we could maintain the current setup but use a Postgres connection URI instead of the Supabase key and URI. We could also implement the embedding generation as a Postgres function. We save close to 14 KV operations per invocation.

Saving 14 KV operations per invocations is massive. Lets do this immediately!

Another alternative would be to limit access to the anonymous key (Clear Text) as much as possible (RLS with Policy) and instead pass JWT tokens from the kernel. These tokens would then be used by the worker to make calls to the Supabase REST API.

This I don't understand how it helps.

@whilefoo
Copy link
Contributor

whilefoo commented Oct 1, 2024

Is it realistic to check if it's the last event in the plugin chain and not keep track of it anymore? That way we can put the heavy ones at the end, like vector embeddings?

Well technically we don't need to keep track if it's the last plugin or only 1 plugin in the chain, but that also means that we don't get to use the response from the plugin. We currently don't use it but we might in the future for example if plugin returns rewards to the kernel or returns comment html for kernel to post...

@0x4007
Copy link
Member Author

0x4007 commented Oct 1, 2024

Seems janky to have a switch in the config to enable this feature dropStateOfLastPlugin: boolean but might be useful in a pinch.

@gentlementlegen
Copy link
Member

gentlementlegen commented Oct 2, 2024

There are plugins that run quite a lot like https://github.com/ubiquity-os-marketplace/automated-merging and https://github.com/ubiquity-os-marketplace/disqualifier when these would only need one run once a day (this would save hundreds of KV calls).I know you're against CRONS but finding something that would behave similarly would be very helpful.

@0x4007
Copy link
Member Author

0x4007 commented Oct 2, 2024

Let's focus on the most prominent problem (vector embeddings plugin) and then work our way down to optimize others as needed.

I have some half baked ideas how to handle these "cron suitable" events. I think there's potential for a solution using my dropLastPluginState: boolean feature mentioned above. I don't love it because it doesn't seem elegant, but it seems simple to use and useful.

@gentlementlegen
Copy link
Member

I think automated-merging and disqualifier consume more than vector-embeddings-plugin without reason. The vector embedding could maybe also run one big batch a day.

@gentlementlegen
Copy link
Member

Coming back to this, after having the daemon-disqualifier running yesterday, we nearly used 100% in one day. I believe adding some CRON capability to the kernel would be very useful for recurring tasks like these, because they could run only once a day and accomplish the same job (also avoid the comment bombing that happened yesterday as well). We also now have 4 orgs running which potentially multiplies by 4 the usage.

My idea would be to add a CRON item in the configuration that would be available in every plugin where we could give a CRON like value. The kernel can keep track of these and call the plugin only when necessary. That would also de-clutter the action runs, which is now sitting at 5k runs and would be near impossible to debug. With a run once a day we could easily output a summary of which tasks have been updated, like daemon-merging does already.

@0x4007
Copy link
Member Author

0x4007 commented Oct 16, 2024

disqualifer

Is disqualifier ignoring bot comments? If not then its recursively invoking itself. It's poorly implemented and should be redone.

"daemons"

For any "daemon" class plugin, if we want the clock to be frequent but also be smart about the use of our KV then here is a solution that combines my previous proposal:

We have a "queue job plugin" at the end of our commented event plugin chain.

All that does is act as a queue/buffer. It collects a queue of jobs to run, with a job nonce, and we can set the recurring runs per time interval (like four times a day) from within its configuration.

nonce

The job nonce should essentially deduplicate what would be redundant jobs, for example, following up on a particular issue (only needs to happen once per interval.) this could also be referred to as a job ID, which describes the type of action (a plugin developer defined action class name) and where it occurs (perhaps a node ID of an issue or pull)

actionClassName-nodeID
followUpIssue-1234

The benefit of this approach is that if nothing is in the queue, it should not attempt to run.

As a final optimization (although i realize now it might not be necessary) is that because it's in the end of the plugin chain, we can stop monitoring KV for any subsequent "daemon" events from the buffer/queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants