Skip to content

Adding overwrite or ignore behavior for row conflicts. #46232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 136 commits into from

Conversation

LSturtew
Copy link

@LSturtew LSturtew commented Mar 4, 2022

@cvonsteg started the effort to implement this. Since they are unable to continue and this is a highly desired feature, I made an attempt to clean up the code and fix merge conflicts. I would like to share the credit for this PR (I do not know how this works).

A new parameter is added to to_sql: on_row_conflct

on_row_conflict : {'fail', 'overwrite', 'append'},
    default: 'fail'
    Determine insertion behavior in case of a primary key clash.
    - 'fail': Do nothing to handle primary key clashes, will raise an Error.
    - 'overwrite': Update existing rows in database with primary key clashes,
      and append the remaining rows with non-conflicting primary keys
    - 'append': Ignore incoming rows with primary key clashes, and
      insert only the incoming rows with non-conflicting primary keys

…t 88a5a481b

git-subtree-dir: vendor/github.com/V0RT3X4/python_utils
git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils
git-vendor-dir: vendor/github.com/V0RT3X4/python_utils
git-vendor-repository: git@github.com:V0RT3X4/python_utils.git
git-vendor-ref: master
git-vendor-name: python_utils
git-vendor-dir: vendor/github.com/V0RT3X4/python_utils
git-vendor-repository: git@github.com:V0RT3X4/python_utils.git
git-vendor-ref: master
…tch function written down for deleting pkeys
…tch function written down for deleting pkeys
…t 88a5a481b

git-subtree-dir: vendor/github.com/V0RT3X4/python_utils
git-subtree-split: 88a5a481b5dbec610e762df862fd69918c1b77d4
git-vendor-name: python_utils
git-vendor-dir: vendor/github.com/V0RT3X4/python_utils
git-vendor-repository: git@github.com:V0RT3X4/python_utils.git
git-vendor-ref: master
git-vendor-name: python_utils
git-vendor-dir: vendor/github.com/V0RT3X4/python_utils
git-vendor-repository: git@github.com:V0RT3X4/python_utils.git
git-vendor-ref: master
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i started reviewing, but this again is adding an enormous amount of complexity. can you get sqlalchemy to do this type of operation directly?

@@ -2799,13 +2800,22 @@ def to_sql(
schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
if_exists : {'fail', 'replace', 'append'},\
default 'fail'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you u using row-continuation here? just leave it

can consider deprecating this and change the name to if_table_exists to be more obvious (but orthogonal)

How to behave if the table already exists.

* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.

on_row_conflict : {'fail', 'overwrite', 'ignore'},\
default 'fail'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a versionadded 1.5.0

if if_exists not in ("fail", "replace", "append"):
raise ValueError(f"'{if_exists}' is not valid for if_exists")

if on_row_conflict not in ("fail", "overwrite", "ignore"):
raise ValueError(f"'{on_row_conflict}' is not valid for on_row_conflict'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes:

    def test_to_sql_invalid_on_row_conflict(self, pkey_frame):
        msg = "'notvalidvalue' is not valid for on_row_conflict"
        with pytest.raises(ValueError, match=msg):
            sql.to_sql(
                pkey_frame,
                "pkey_frame1",
                self.conn,
                if_exists="append",
                on_row_conflict="notvalidvalue",
            )

pandas/io/sql.py Outdated
if if_exists not in ("fail", "replace", "append"):
raise ValueError(f"'{if_exists}' is not valid for if_exists")

if on_row_conflict not in ("fail", "overwrite", "ignore"):
raise ValueError(f"'{on_row_conflict}' is not valid for on_row_conflict'")
# on_row_conflict only used with append
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line before, comments can be inside the clause

pandas/io/sql.py Outdated
@@ -827,6 +850,182 @@ def create(self):
else:
self._execute_create()

def _load_existing_pkeys(self, primary_keys, primary_key_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type inputs & outputs

@jreback jreback added the IO SQL to_sql, read_sql, read_sql_query label Mar 5, 2022
@LSturtew
Copy link
Author

LSturtew commented Mar 6, 2022

Sqlalchemy handles row conflicts differently depending on the database dialect. E.g: for postgress and sqlite the implementation is similar to what was originally proposed in the original PR: on_conflict_do_nothing or on_conflict_do_update, for mysql on the other hand this is handled with on_duplicate_key_update.

Luka Sturtewagen added 2 commits March 6, 2022 09:59
@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 6, 2022
@maveec
Copy link

maveec commented Apr 27, 2022

@LSturtew seems little is missing to finalize this. Thank you for your effort and keep up the great work, lots of people are - and will be - grateful for this =)

@hichameyessou
Copy link

@LSturtew few months passed, do you still have the capacity to complete this PR?

@JustinGuese
Copy link

can we run this again? the logs for the tests are gone... Or what seems to be missing? I guess it is only a problem with the Docstrings and types, right?

@JustinGuese
Copy link

I have added a pr for some small docstring changes to the repo of @LSturtew, because if I create a new PR to the pandas main branch all the colab would be lost I guess?
Otherwise, if you (LSturtew) give editing rights to maintainers I can do it directly (https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)

@LSturtew
Copy link
Author

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

The box Allow edits and access to secrets by maintainers is checked

@JustinGuese
Copy link

JustinGuese commented Aug 30, 2022

okay, i can't get it to work... maybe I'm just stuck in my thoughts

ERROR: Permission to LSturtew/pandas.git denied to JustinGuese.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Same when trying to do the same push to pandas-dev/pandas.git

I'm:

  1. gh pr checkout -R pandas-dev/pandas 46232
  2. adding my changes, git add & git commit
  3. git push git@github.com:LSturtew/pandas.git sql-upsert:sql-upsert

what am I doing wrong? Do I need to create a separate PR? But I do not want the main work by everyone to be lost

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding (Insert or update if key exists) option to .to_sql
7 participants