Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't sync the schema_migrations_ran table when adding it #383

Merged

Conversation

carbonin
Copy link
Member

The purpose of the table is to track when migrations are run on
remote regions. If we sync the whole table outside of the scope
of the transaction based replication then we're violating the
assumption that the migration version number only appears in the
global region when the data from that migration was replicated
up to the global region.

This was causing a timing issue where the sync could take place
just before the global region runs its migrations. Then the global
would blaze past migrations without successfully replicating up
the data needed for a particular schema version.

This could lead to an irrecoverable situation in replication where
the subscription would have to be removed and re-added.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721596

@carbonin carbonin requested review from Fryguy and bdunne June 18, 2019 23:06
@carbonin
Copy link
Member Author

I was able to reproduce the conflicts we saw in the schema_migrations_ran table, but not the actual failure of replication. I suspect that I would need some more data in the tables in question to hit the timing necessary to make it happen.

I was able to verify our suspicion that the entire contents of the remote region's schema_migrations_ran table were replicated to the global after just migrating to the version that adds the table on the global. That is really all we need to know to prove, in theory, that we could get into the situation described in the BZ.

After applying this patch there were no more conflicts in the PG logs and the migration also completed without issue.

@carbonin carbonin added the bug label Jun 18, 2019
@carbonin
Copy link
Member Author

This really does need to be backported, but I have no idea what the implications of that would be ...

@bdunne
Copy link
Member

bdunne commented Jun 18, 2019

@carbonin Do we need to test again with this change that the only records in the table after running this migration on the global are the old records?

@carbonin
Copy link
Member Author

@bdunne I think that's not exactly how it would work.

It sounds like you think that rake db:migrate VERSION=20171031010000 on the global after a full migration on the remote would lead to the global schema_migrations_ran table only containing remote region records with version < the latest version migrated to on the global.

What actually happens is that all of the changes which can be applied to the current schema from the remote to the global are replicated. Put another way, replication runs until it blows up.

What we should expect is that the global schema_migrations_ran table will contain only records for versions which are before the first migration which generates data which references schema that is not yet present in the global database.

So the short answer is ☝️ is what I reproduced, but with the visible column which was added to zones in 20180618083035. Left to it's own this would resolve itself and not actually cause the irrecoverable situation we see with product_name and switches because the column is not subsequently removed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
The purpose of the table is to track when migrations are run on
remote regions. If we sync the whole table outside of the scope
of the transaction based replication then we're violating the
assumption that the migration version number only appears in the
global region when the data from that migration was replicated
up to the global region.

This was causing a timing issue where the sync could take place
just before the global region runs its migrations. Then the global
would blaze past migrations without successfully replicating up
the data needed for a particular schema version.

This could lead to an irrecoverable situation in replication where
the subscription would have to be removed and re-added.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721596
@carbonin carbonin force-pushed the dont_sync_the_schema_migrations_ran branch from f2866b6 to 4f106e5 Compare June 19, 2019 14:03
@miq-bot
Copy link
Member

miq-bot commented Jun 19, 2019

Checked commit carbonin@4f106e5 with ruby 2.3.3, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 👍

@bdunne bdunne merged commit 88dfe84 into ManageIQ:master Jun 19, 2019
@bdunne bdunne self-assigned this Jun 19, 2019
@bdunne bdunne added this to the Sprint 114 Ending Jun 24, 2019 milestone Jun 19, 2019
simaishi pushed a commit that referenced this pull request Jun 20, 2019
…_ran

Don't sync the schema_migrations_ran table when adding it

(cherry picked from commit 88dfe84)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1722540
@simaishi
Copy link
Contributor

Hammer backport details:

$ git log -1
commit ef8bc8889f913f577342acffffcc0b4916de7f06
Author: Brandon Dunne <brandondunne@hotmail.com>
Date:   Wed Jun 19 10:12:35 2019 -0400

    Merge pull request #383 from carbonin/dont_sync_the_schema_migrations_ran
    
    Don't sync the schema_migrations_ran table when adding it
    
    (cherry picked from commit 88dfe84442cf4ff6c5fb82f9e706478ffd54aa59)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1722540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants