-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Adding overwrite or ignore behavior for row conflicts. #46232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t 88a5a481b git-subtree-dir: vendor/github.com/V0RT3X4/python_utils git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: git@github.com:V0RT3X4/python_utils.git git-vendor-ref: master
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: git@github.com:V0RT3X4/python_utils.git git-vendor-ref: master
…tch function written down for deleting pkeys
…tch function written down for deleting pkeys
…hed upsert ignore method
…hed upsert ignore method
…t 88a5a481b git-subtree-dir: vendor/github.com/V0RT3X4/python_utils git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: git@github.com:V0RT3X4/python_utils.git git-vendor-ref: master
git-vendor-name: python_utils git-vendor-dir: vendor/github.com/V0RT3X4/python_utils git-vendor-repository: git@github.com:V0RT3X4/python_utils.git git-vendor-ref: master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i started reviewing, but this again is adding an enormous amount of complexity. can you get sqlalchemy to do this type of operation directly?
pandas/core/generic.py
Outdated
@@ -2799,13 +2800,22 @@ def to_sql( | |||
schema : str, optional | |||
Specify the schema (if database flavor supports this). If None, use | |||
default schema. | |||
if_exists : {'fail', 'replace', 'append'}, default 'fail' | |||
if_exists : {'fail', 'replace', 'append'},\ | |||
default 'fail' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you u using row-continuation here? just leave it
can consider deprecating this and change the name to if_table_exists
to be more obvious (but orthogonal)
pandas/core/generic.py
Outdated
How to behave if the table already exists. | ||
|
||
* fail: Raise a ValueError. | ||
* replace: Drop the table before inserting new values. | ||
* append: Insert new values to the existing table. | ||
|
||
on_row_conflict : {'fail', 'overwrite', 'ignore'},\ | ||
default 'fail' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a versionadded 1.5.0
if if_exists not in ("fail", "replace", "append"): | ||
raise ValueError(f"'{if_exists}' is not valid for if_exists") | ||
|
||
if on_row_conflict not in ("fail", "overwrite", "ignore"): | ||
raise ValueError(f"'{on_row_conflict}' is not valid for on_row_conflict'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes:
def test_to_sql_invalid_on_row_conflict(self, pkey_frame):
msg = "'notvalidvalue' is not valid for on_row_conflict"
with pytest.raises(ValueError, match=msg):
sql.to_sql(
pkey_frame,
"pkey_frame1",
self.conn,
if_exists="append",
on_row_conflict="notvalidvalue",
)
pandas/io/sql.py
Outdated
if if_exists not in ("fail", "replace", "append"): | ||
raise ValueError(f"'{if_exists}' is not valid for if_exists") | ||
|
||
if on_row_conflict not in ("fail", "overwrite", "ignore"): | ||
raise ValueError(f"'{on_row_conflict}' is not valid for on_row_conflict'") | ||
# on_row_conflict only used with append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before, comments can be inside the clause
pandas/io/sql.py
Outdated
@@ -827,6 +850,182 @@ def create(self): | |||
else: | |||
self._execute_create() | |||
|
|||
def _load_existing_pkeys(self, primary_keys, primary_key_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type inputs & outputs
Sqlalchemy handles row conflicts differently depending on the database dialect. E.g: for postgress and sqlite the implementation is similar to what was originally proposed in the original PR: |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@LSturtew seems little is missing to finalize this. Thank you for your effort and keep up the great work, lots of people are - and will be - grateful for this =) |
@LSturtew few months passed, do you still have the capacity to complete this PR? |
can we run this again? the logs for the tests are gone... Or what seems to be missing? I guess it is only a problem with the Docstrings and types, right? |
I have added a pr for some small docstring changes to the repo of @LSturtew, because if I create a new PR to the pandas main branch all the colab would be lost I guess? |
The box |
okay, i can't get it to work... maybe I'm just stuck in my thoughts
Same when trying to do the same push to pandas-dev/pandas.git I'm:
what am I doing wrong? Do I need to create a separate PR? But I do not want the main work by everyone to be lost |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
@cvonsteg started the effort to implement this. Since they are unable to continue and this is a highly desired feature, I made an attempt to clean up the code and fix merge conflicts. I would like to share the credit for this PR (I do not know how this works).
A new parameter is added to
to_sql
:on_row_conflct
.to_sql
#14553, continuation of Adding (Insert or update if key exists) option to .to_sql #14553 #29636doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.