-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new replication gem #18686
Use new replication gem #18686
Conversation
Awesome... love all the 🔴 red 🔴 |
f12ded7
to
8666f5d
Compare
f7a0471
to
060f036
Compare
Most notably, the concept of a node goes away and we can set the tables in a publication directly rather than adding and removing them one by one. This also eliminates the need for a lock around the publication because the operation is atomic.
No one is using this and the concept of a "node" is gone now
Previously it was a separate suite, but now that we're using the built-in replication technology we can put it next to the other lib specs
This is needed to allow the subscription to pick up on tables that were newly added to the publication in the remote region.
@Fryguy I think this is ready to go, the gems are released and I tested out all the edge cases I could think of. I don't want to un-wip just yet though because I think merging this will break the UI specs if we don't also merge ManageIQ/manageiq-ui-classic#5546 rather quickly afterwards and that hasn't been reviewed yet. |
Consider the UI bit approved 👍 :) |
def safe_delete | ||
pglogical.drop_subscription(id, true) | ||
rescue PG::InternalError => e | ||
raise unless e.message =~ /could not connect to publisher/ || e.message =~ /replication slot .* does not exist/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone has a non-English postgres installation, will this still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not? I can try to change the appliance locale and see if things break, but the alternative would be to just remove this conditional and allow the next command to raise if it was some other error so I feel like it makes sense to just leave it in given that we don't have a more specific error to rescue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I didn't know if it was a problem if the code tries to move forward or not. I know in the past that message text checking has been brittle in particular with non-English databases, but I assume this was last resort for you because there wasn't something more specific.
# Sets the tables that should be used to create the replication set using refresh_excludes | ||
def configured_excludes=(new_excludes) | ||
@configured_excludes = new_excludes | ALWAYS_EXCLUDED_TABLES | ||
@pg_connection = ApplicationRecord.connection.raw_connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this, and could this create issues with ActiveRecord connection pooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were to checkout a new connection I guess it could, but otherwise it's using the same connection that the worker process would be using anyway, right? I guess the exception would be if we're using any features provided specifically by AR like locking or something, but if you think it could be an issue I can grab the conn info out of this object and open up a new connection. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm just concerned that we are holding a handle to the raw connection, which may also be given out again when someone else calls ApplicationRecord.connection. I'm not sure if 2 threads using the same raw_connection is a problem or not. When we were using the ActiveRecord connection instead of the raw connection I was less concerned because that was the object being pooled. (TBH, I don't understand how the pooling works when you call ApplicationRecord.connection, since I don't know what puts it back into the pool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely a valid concern. @jrafanie any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pooling is thread based. My concern would be that we consume the connection for the life of this client and could cause us to exhaust the pool if these clients could ever run at the same time. Additionally, I don't think connection
will explicitly checkin the connection so I'm not sure when this would be checked in, otherwise, it would wait until when the pool is full and the reaper starts checking for dead threads holding connections.
If you use the connection configuration to establish the connection with the PG gem directly, it feels like you'll get the parts of of activerecord you want without the risk of peeing in the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's thread based, and the lifetime of this MiqPglogical object is on-demand, then we are probably ok and can merge. On the other hand, if we keep a cached instance of the MiqPglogical instance, then we would leak the underlying connection, and that would be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I was saying before about checking out a new connection. But I think it's just using the one that's already assigned to the thread.
And as far as I can tell the only place that's even saving an instance past one invocation is a rake task:
app/models/miq_region.rb:146: MiqPglogical.new.provider?
app/models/miq_region.rb:150: MiqPglogical.new.subscriber?
app/models/miq_region.rb:167: MiqPglogical.new.destroy_provider if current_type == :remote
app/models/miq_region.rb:169: MiqPglogical.new.configure_provider if desired_type == :remote
lib/tasks/evm_dbsync.rake:24: pgl = MiqPglogical.new
My only other general question is how much of this could be pushed down into the gem? There are a handful of things here that seems like they could be made into a method in the gem. Even so, I'd be ok with doing those later as a follow up. |
This is the situation where the remote server is unreachable so we need to set the replication slot to "NONE" in order to drop the subscription without error. But we don't want to end up in an in-between case where the subscription is disabled, but not deleted so wrap the three commands in a transaction.
…ibutes We have a .subscription_status method that actually returns the status of a subscription based on the number of workers and the enabled attribute. The instance method is actually returning a new copy of the current subscription's attirubtes for fetching things like the current wal location or the password so we can more accurately name it subscription_attributes
@Fryguy I'm not sure there's a whole lot I would want to push down into the gem, but like you said, we can catch those as we move forward from here. |
Checked commits carbonin/manageiq@8cec170~...eed3a50 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 |
As of ManageIQ/manageiq#18686 we moved from pglogical to the pg-logical_replication using postgresql's native logical replication. We switched as of ivanchuk. It should be safe to drop this now. Funny enough, this actually helps with getting ruby 3.0 support because the AR extension signature for drop_table needs to change to work with ruby 2.x and 3.x.
We switched to the logical replication handler in 2019, orphaning this handler. See: ManageIQ/manageiq#18686 and specifically this commit: ManageIQ/manageiq@81ddc80
This PR moves us from the pg-pglogical gem to the pg-logical_replication one.
This means that we will be utilizing the built-in logical replication feature new in PostgreSQL 10.
A few things have changed between the two, but the general concepts are compatible. There will still be subscriptions configured in the global region and a replication slot in the remote region.
Major changes:
Terminology changes: