You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing some FLAME runners to do KMeans clustering. Trying to figure out how many embeddings I can do per fly.io machine size, so I'm testing a bunch of different values until failure.
When I use a value too high, the process crashes, the machine cleans up, and subsequent invocations of the runner error with:
** (ArgumentError) errors were found at the given arguments:
* 1st argument: the table identifier does not refer to an existing ETS table
(stdlib 5.2.3) :ets.lookup_element(:kmeans_md, :meta, 2)
(flame 0.5.1) lib/flame/pool.ex:381: FLAME.Pool.lookup_meta/1
(flame 0.5.1) lib/flame/pool.ex:315: FLAME.Pool.caller_checkout!/5
iex:1: (file)
I'm using these as single_use: true so I'm not super concerned about the OOM error. If this happened in production though, it'd be a pretty big issue that all subsequent invocations failed.
Not sure how to resolve the issue either - sometimes a restart fixes it, sometimes waiting 10+ minutes. Not a blocker for me (the point of the stress testing is to avoid this in prod), but seemed report-worthy!
I'm testing some FLAME runners to do KMeans clustering. Trying to figure out how many embeddings I can do per fly.io machine size, so I'm testing a bunch of different values until failure.
When I use a value too high, the process crashes, the machine cleans up, and subsequent invocations of the runner error with:
I'm using these as
single_use: true
so I'm not super concerned about the OOM error. If this happened in production though, it'd be a pretty big issue that all subsequent invocations failed.Not sure how to resolve the issue either - sometimes a restart fixes it, sometimes waiting 10+ minutes. Not a blocker for me (the point of the stress testing is to avoid this in prod), but seemed report-worthy!
—-
PS: Flame configuration:
The text was updated successfully, but these errors were encountered: