handling large table. #336
Unanswered
February24-Lee
asked this question in
Q&A
Replies: 1 comment 3 replies
-
I think one trick is to merge all the text columns and produce only 1 embedding from the merged columns. These are my personal thoughts. Any comments are welcomed. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, is there an efficient method or future development idea for handling large tables?
In my case, the table size is 1,442,792 x 12 with 2 categorical, 4 numerical, and 4 embedded-text columns. I cached all of the text with the OpenAI embedding API, so the dimension of each text is almost 1,500. The problem is that it consumes too much memory and it kills the process early (in fact, I found this was my mistake). It was inconvenient when conducting repetitive short experiments.
So, I modified the dataset and loader to execute convert_to_tensor_frame when called. Here is my code: https://github.com/February24-Lee/pytorch-frame/pull/1/files I thought this method was the simplest and required fewer modifications, though not fancy.
Anyway, I'm curious about your thoughts or future plans regarding handling large tables like mine.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions