Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type of "to_sql" is partially unknown #353

Closed
gertcuykens opened this issue Oct 4, 2022 · 17 comments · Fixed by #360
Closed

Type of "to_sql" is partially unknown #353

gertcuykens opened this issue Oct 4, 2022 · 17 comments · Fixed by #360

Comments

@gertcuykens
Copy link
Contributor

gertcuykens commented Oct 4, 2022

osx vscode strict mode
pip3.10 install -U 'psycopg[c]'
pip3.10 install -U --pre SQLAlchemy
pip3.10 install -U pandas
pip3.10 install -U pandas-stubs

    engine = create_engine(
        "postgresql+psycopg://gert:p@localhost/gert", echo=False, future=True
    )

    with engine.begin() as conn:
        stmt = select(A)
        df = pd.read_sql(stmt, conn)
        with pd.option_context('display.max_rows', None, 'display.max_columns', None):
            print(df)
        df.to_sql('test', conn, if_exists='replace', index = False)

Type of "to_sql" is partially unknown

image

image

@bashtage
Copy link
Contributor

bashtage commented Oct 4, 2022

PRs welcome to improve missing types.

@gertcuykens
Copy link
Contributor Author

gertcuykens commented Oct 4, 2022

https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/io/sql.pyi#L103-L118

I assume it has to do with this part right?

Maybe it's just this two lines for it to work?

index_label: str =...,
chunksize: int=...,

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

I think this is more to do with sqlalchemy types. If I use strict mode in VS Code on your sample, I get this:
image

Doing a reveal_type(conn) shows conn as being Unknown, so then there is no match for to_sql(), because conn has an unknown type.

After the first with statement, if I insert:

        conn = cast(Connectable, conn)

then the to_sql() statement type checks fine.

If I use basic type checking in VS Code, there is no problem.

Note that on the create_engine() statement, I am getting the following message in VS Code:

Type of "create_engine" is partially unknown
  Type of "create_engine" is "Overload[(url: URL | str, *, strategy: Literal['mock'], **kwargs: Unknown) -> MockConnection, (url: URL | str, *, module: Any | None = ..., enable_from_linting: bool = ..., future: Literal[True], **kwargs: Unknown) -> Engine, (url: URL | str, *, module: Any | None = ..., enable_from_linting: bool = ..., future: Literal[False] = ..., **kwargs: Unknown) -> Engine]"

@gertcuykens
Copy link
Contributor Author

gertcuykens commented Oct 4, 2022

use this version pip3.10 install -U --pre SQLAlchemy
for me everything is fine in strict mode except to_sql

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

use this version pip3.10 install -U --pre SQLAlchemy for me everything is fine in strict mode except to_sql

I did do that. I'm on Windows, so not sure if that makes a difference. What got installed was 1.4.41.

Can you do a reveal_type(conn) and see what VS Code reports?

@gertcuykens
Copy link
Contributor Author

gertcuykens commented Oct 4, 2022

image

image

#!/usr/bin/env python3.10
from rich import print
import pandas as pd

from sqlalchemy import func, Column, DateTime, ForeignKey, Integer, String, create_engine
from sqlalchemy.future import select
from sqlalchemy.orm import declarative_base, relationship, selectinload
from sqlalchemy.dialects.postgresql import JSONB

engine = create_engine(
    "postgresql+psycopg://gert:p@localhost/gert", echo=False, future=True
)
Base = declarative_base()

class A(Base):
    __tablename__ = "a"
    id = Column(Integer, primary_key=True)
    data = Column(String, doc="A data", comment="A data")
    create_date = Column(DateTime, server_default=func.now())
    bs = relationship("B")
    __mapper_args__ = {"eager_defaults": True}

class B(Base):
    __tablename__ = "b"
    id = Column(Integer, primary_key=True)
    a_id = Column(Integer, ForeignKey("a.id"))
    data = Column(JSONB, nullable=True, doc="P data", comment="P data")

with engine.begin() as conn:
    stmt = select(A).options(selectinload(A.bs))
    df = pd.read_sql(stmt, conn)
    with pd.option_context('display.max_rows', None, 'display.max_columns', None):
        print(df)
    df.to_sql('test', conn, if_exists='replace', index = False)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

OK, I now know what is happening here. The message about being "partially unknown" is because to_sql() returns int | None. The result is based on the underlying connector and database and whether it returns the number of rows inserted. We can't do static typing to determine that.

So you have to do a cast here or turn down the typing setting from strict to basic

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

Never mind. It has something to do with this declaration in the stubs:

        method: Literal["multi"]
        | Callable[[SQLTable, Any, list[str], Iterable], int | None]
        | None = ...,

If Iterable is changed to Iterable[Any], then the message goes away.

PR welcome to change:

| Callable[[SQLTable, Any, list[str], Iterable], int | None]

to use Iterable[Any] here, Would accept a PR without tests.

@twoertwein
Copy link
Member

If Iterable is changed to Iterable[Any], then the message goes away.

We could slowly work towards enabling pyright's reportUnknownParameterType/reportUnknownArgumentType. I think they would cover that.

@gandhis1
Copy link
Contributor

gandhis1 commented Oct 4, 2022

Is it supposed to be Any? Isn't it an Iterable[Tuple[Any, ...]] or something like that? Could even use ScalarT here maybe, as in Iterable[Tuple[ScalarT, ...]]

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

Is it supposed to be Any? Isn't it an Iterable[Tuple[Any, ...]] or something like that? Could even use ScalarT here maybe, as in Iterable[Tuple[ScalarT, ...]]

Not entirely clear from the documentation. Maybe Iterable[list[Scalar] | tuple[Scalar]] is the right thing. Can't use Sequence, because str matches Sequence[str]

@gandhis1
Copy link
Contributor

gandhis1 commented Oct 4, 2022

Here's what appears to be the code: https://github.com/pandas-dev/pandas/blob/e0608cb42e98f165fd99b595b2c9d0bfeb208b8f/pandas/io/sql.py#L948

data_list is a List[NdArray], and gets passed to zip, so I believe it is indeed a tuple

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 4, 2022

@gandhis1
Copy link
Contributor

gandhis1 commented Oct 6, 2022

I believe that code accepts data_iter, the Iterator[Tuple[Any, ...]] as an input. There is a docstring there, but I believe that docstring is wrong:

        data_iter : generator of list
           Each item contains a list of values to be inserted

In any case I would need to put together an example to prove this, in the interim it certainly doesn't hurt much to have a slightly broader type

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 6, 2022

Based on https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method, and that it uses csv.writerows(), which accepts any Iterable, I think list is OK too.

@gandhis1
Copy link
Contributor

gandhis1 commented Oct 6, 2022

Here is a code example:

import pandas as pd
import sqlite3


def insert_method(pd_table, conn, keys, data_iter):
    print("pd_table:", type(pd_table))
    print("conn:", type(conn))
    print("keys:", type(keys))
    print("data_iter:", type(data_iter))
    data = list(data_iter)
    print("data_iter:", type(data[0]))

with sqlite3.connect(":memory:") as con:
    df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
    df.to_sql("table", con=con, method=insert_method)

pd_table: <class 'pandas.io.sql.SQLiteTable'>
conn: <class 'sqlite3.Cursor'>
keys: <class 'list'>
data_iter: <class 'zip'>
data_iter: <class 'tuple'>

This confirms my thinking above that ultimately a zip gets passed, so data_iter is always going to have a tuple inner element. The harm in accepting list is very minimal. The only thing preventing list would do is catch if someone was mutated the object with append() or otherwise. Like, for instance, this:

def insert_method(pd_table, conn, keys, data_iter):
   for row in data_iter:
       row.remove("something")
       # do something with the row

Probably not something an insertion method function should do (mutating values), but nevertheless...

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 6, 2022

This confirms my thinking above that ultimately a zip gets passed, so data_iter is always going to have a tuple inner element.

Thanks for the test. I misread the code. We should just use Iterable[tuple[Any,...]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants