Daft Equivalent Syntax for Pandas Dataframe.apply(axis=1) #1041
-
for straighforward example: how to do it in daft? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Thanks for the question @dendihandian! As an aside to your question (unrelated to your question about the API itself), it might be a lot easier if we could instead maybe read the entire SQLite table as a Daft Dataframe? Then you will not have to rely on a custom Python function to query sqlite, and it is likely that you probably want something more like a To answer your question: There is no API for running over the entire row at a time. This is because Daft is very deliberate about exactly which columns you need, and uses this information to optimize your work. Here are some ways of adapting your use-case to canonical Daft code: Applying a Python function on a single columnAs an example, here is a def processing_row(id: int) -> str:
sqlite_conn = sqlite3.connect("my.db")
cur = sqlite_conn.cursor()
data = cur.execute(f"SELECT name FROM table WHERE id={id}").fetchone()
return data[0]
df = df.with_column(
"sqlite_data",
df["id"].apply(processing_row, return_dtype=daft.DataType.string())
) Further extensionsI would note a couple of problems with the previous approach
Here is an example UDF that performs the above optimizations, and takes as inputs two columns ( @daft.udf(return_dtype=daft.DataType.string())
class GetFullName:
def __init__(self):
# Initialize this once, shared across multiple invocations
self.sqlite_conn = sqlite3.connect("my.db")
def __call__(self, ids: daft.Series, last_names: daft.Series) -> list[str]:
full_names = []
for id, last_name in zip(ids.to_pylist(), last_names.to_pylist()):
cur = self.sqlite_conn.cursor()
name = cur.execute(f"SELECT name FROM table WHERE id={id}").fetchone()[0]
full_names.append(f"{name} {last_name}")
return full_names
df = df.with_column("full_name", GetFullName(df["id"], df["last_name"])) Note that this is really nice because:
|
Beta Was this translation helpful? Give feedback.
Thanks for the question @dendihandian!
As an aside to your question (unrelated to your question about the API itself), it might be a lot easier if we could instead maybe read the entire SQLite table as a Daft Dataframe? Then you will not have to rely on a custom Python function to query sqlite, and it is likely that you probably want something more like a
.join
between your SQLite table and your dataframe. But I digress!To answer your question: There is no API for running over the entire row at a time. This is because Daft is very deliberate about exactly which columns you need, and uses this information to optimize your work. Here are some ways of adapting your use-case to canonical Da…