Skip to content

Datafusion fails to read from LanceDataset #3281

@dreverri

Description

@dreverri

I'm getting the following error when trying to read a LanceDB table with Datafusion:

[2024-12-20T19:10:11Z WARN  lance::dataset::write::insert] No existing dataset at /lance-dataset/data/sample-lancedb/my_table.lance, it will be created
Traceback (most recent call last):
  File "/lance-dataset/hello.py", line 23, in <module>
    main()
    ~~~~^^
  File "/lance-dataset/hello.py", line 19, in main
    df.show()
    ~~~~~~~^^
  File "/lance-dataset/.venv/lib/python3.13/site-packages/datafusion/dataframe.py", line 360, in show
    self.df.show(num)
    ~~~~~~~~~~~~^^^^^
Exception: External error: TypeError: LanceFragment.scanner() takes 1 positional argument but 2 positional arguments (and 3 keyword-only arguments) were given

I'm not sure if this is an issue with LanceDataset or Datafusion or if I am just doing something wrong.

Here is the code:

from datafusion import SessionContext
import lancedb


def main():
    uri = "data/sample-lancedb"
    db = lancedb.connect(uri)

    data = [
        {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
        {"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
    ]

    tbl = db.create_table("my_table", data=data, mode="overwrite")

    ctx = SessionContext()
    ctx.register_dataset("my_table", tbl.to_lance())
    df = ctx.table("my_table")
    df.show()


if __name__ == "__main__":
    main()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions