Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten replication chain #1115

Closed
mlissner opened this issue Jan 3, 2020 · 21 comments
Closed

Shorten replication chain #1115

mlissner opened this issue Jan 3, 2020 · 21 comments
Assignees

Comments

@mlissner
Copy link
Member

mlissner commented Jan 3, 2020

As discussed in #1109, the chain of replication currently goes from:

master --> old-master --> AWS server --> Clients

The old-server should be axed from this chain. I have a post on the psql mailing list about this. Hopefully there will be some responses soon.

@mlissner
Copy link
Member Author

mlissner commented Jan 8, 2020

Here's my message:

Hi, I'm trying to figure out how to shorten a chain of logically
replicating servers. Right now we have three servers replicating like
so:

A --> B --> C

And I'd like to remove B from the chain of replication so that I only have:

A --> C

Of course, doing this without losing data is the goal. If the
replication to C breaks temporarily, that's fine, so long as all the
changes on A make it to C eventually.

I'm not sure how to proceed with this. My best theory is:

1. In a transaction, DISABLE the replication from A to B and start a
new PUBLICATION on A that C will subscribe to in step ③ below. The
hope is that this will simultaneously stop sending changes to B while
starting a log of new changes that can later be sent to C.

2. Let any changes queued on B flush to C. (How to know when they're
all flushed?)

3. Subscribe C to the new PUBLICATION created in step ①. Create the
subscription with copy_data=False. This should send all changes to C
that hadn't been sent to B, without sending the complete tables.

4. DROP all replication to/from B (this is just cleanup; the incoming
changes to B were disabled in step ①, and outgoing changes from B were
flushed in step ②).

Does this sound even close to the right approach? Logical replication
can be a bit finicky, so I'd love to have some validation of the
general approach before I go down this road.

Thanks everybody and happy new year,

Mike

@mlissner
Copy link
Member Author

mlissner commented Jan 8, 2020

The one response I got:

I don't think that will work.

Any changes on A that take place between step 1 and step 3 wouldn't be
replicated to C.

You'd have to suspend all data modification on A in that interval.

I asked "How?" and they replied:

The simplest solution would be to stop the applications that use PostgreSQL.

You could block client connections using a "pg_hba.conf" entry
(and kill the established connections).

Another option can be to set "default_transaction_read_only = on",
but that will only work if the clients don't override it explicitly.

@mlissner
Copy link
Member Author

mlissner commented Jan 8, 2020

I don't know how I feel about this at this point. It seems like there's no easy or good way to do this, which is pretty terrible. If there was a way to start enquing changes at the same time as stopping replication, we'd be in business, but I haven't figured that out yet. I'm going to try asking that of the mailing list. We'll see...

@mlissner
Copy link
Member Author

mlissner commented Jan 8, 2020

OK, we'll see if this gets any responses:

Hi all, this is a follow up from an earlier question I asked about
shortening a chain of logically replicating servers. Right now we have
three servers replicating like this:

A --> B --> C

And we want to remove B so that we have:

A --> C

Is it possible to DROP the subscription on B to A and then to
SUBSCRIBE C to the previously used publication on A without losing
data?

E.g., assuming the following:

 - "A" has a PUBLICATION named "A-to-B-Pub" that "B" subscribes to.
 - "C" subscribes to "B" with a subscription named "B-to-C-Sub".

Would this work?

1. On B, DROP the subscription to "A-to-B-Pub".

2. Let any cached changes on B flush to C. Give it an hour to be sure.

3. On C ALTER  "B-to-C-Sub" to subscribe to the now-used "A-to-B-Pub" on A.

Seems like this would either work perfectly or totally fail. Any ideas?

@mlissner
Copy link
Member Author

mlissner commented Jan 9, 2020

OK, some better conversation on this one. Looks like this will be doable. The trick is to drop the subscription while keeping the replication slot. You can do that with something like:

Disassociate the subscription from the replication slot by executing ALTER SUBSCRIPTION ... SET (slot_name = NONE). After that, DROP SUBSCRIPTION will no longer attempt any actions on a remote host.

That'll keep the slot on "A" open so that changes pile up there.

Then, on C, you can wait for changes to flush from B, then you subscribe C directly to A. And you should be off to the races!

@mlissner
Copy link
Member Author

OK, so the plan is to do a test set up not too different from the one I did in #977. I'll set up three docker containers on ports 5432, 5433, and 5434. And I'll set up replication from lowest port to highest. I'll set up some basic table, and I'll write a little script to add content on the low port so we can observe it going to the high port. Then, with ongoing writes to the low port, I'll attempt the approach above. We'll see if it all works.

@mlissner
Copy link
Member Author

mlissner commented Jan 18, 2020

Setting up the test environment

I started by getting three images going with:

 docker run -p 5432 postgres -p 5432 -c wal_level=logical
 docker run -p 5433 postgres -p 5433 -c wal_level=logical
 docker run -p 5434 postgres -p 5434 -c wal_level=logical

I wasn't able to figure out how to psql into the images from my localhost; my docker is rusty. But I was able to get into the docker images, and then do psql once inside. For each of the images, I did:

docker container ls | grep postgre
docker exec -it $some-id bash
# I'm now inside the image
psql -U postgres --port XXX

Once inside each of the three images, I created a database with one table with one column. I did this on each of the three images since they don't replicate schemas.

create database test;
\c test;
create table test_table (test_col varchar(10));

Then, on the master publisher, I was able to insert values with:

insert into test_table (test_col) values ('It works!');

The next step is to set up replication from 5432 --> 5433 --> 5434. We start by making users with passwords on each of the publishers:

CREATE ROLE user1 WITH REPLICATION LOGIN;
\password user1;
grant select on test_table to user1;

Then we make two publishers:

On 5432 I ran:

create publication test_pub for all tables;

On 5433 I ran:

create publication test_pub2 for all tables;

I didn't create a publication on 5434 (it's the end of the chain).

Finally we create two subscribers:

CREATE SUBSCRIPTION test_sub CONNECTION 'host=xxx port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub;

To get the correct host I used docker network inspect bridge with docker container ls | grep postgres.

With the above done, I could see data replicating. Whoo!

Onwards to the test.

@mlissner
Copy link
Member Author

mlissner commented Jan 18, 2020

Running the test

Ok, this delightful mess with insert a value into our 5432 DB every quarter second (or whatever I tweak it to):

watch -n 0.25 'psql -U postgres -d test -c "insert into test_table (test_col) values ('\''It works'\'');"'

(That's being run inside our 5432 DB.)

So I'm going to set that up to run pretty frequently and we'll see if anything gets dropped when I shorten the chain.

Step one, disconnect 5433 from 5432, but maintain changes on 5432

On the middle server I run:

test=# BEGIN;
BEGIN
test=# alter subscription test_sub disable;
ALTER SUBSCRIPTION
test=# alter subscription test_sub set (slot_name = NONE);
ALTER SUBSCRIPTION
test=# COMMIT;

I don't know if the transaction helps. I hope so, but we're in the woods here. The step to disable had to be done first because without it you get an error that:

test=# alter subscription test_sub set (slot_name = NONE);
ERROR:  cannot set slot_name = NONE for enabled subscription

After 5433 was unsubscribed from 5432, I was able to check that it and the terminal server (5434) had the same number of items as INSERT'ed by the still-running watch command. They do.

Step two, subscribe 5434 to 5432

Next, I subscribe 5434 to the publication that was previously sending data to 5433. My first attempt at this went...poorly. I ran this at first:

test=# CREATE SUBSCRIPTION test_sub CONNECTION 'host=172.17.0.5 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub3;

Which returned:

ERROR:  could not create replication slot "test_sub": ERROR:  replication slot "test_sub" already exists

Fair enough. So I pulled up the CREATE SUBSCRIPTION docs, and then ran:

CREATE SUBSCRIPTION test_sub CONNECTION 'host=172.17.0.5 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub3 with (create_slot=false, slot_name=test_sub);
CREATE SUBSCRIPTION

That worked, but I was getting error messages on 5434 (the server I ran it on) that said:

2020-01-18 07:02:59.983 UTC [136] STATEMENT:  CREATE SUBSCRIPTION test_sub CONNECTION 'host=172.17.0.5 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub3;
2020-01-18 07:06:57.031 UTC [205] LOG:  logical replication apply worker for subscription "test_sub" has started
2020-01-18 07:06:57.039 UTC [205] ERROR:  could not receive data from WAL stream: ERROR:  publication "test_pub3" does not exist
	CONTEXT:  slot "test_sub", output plugin "pgoutput", in the change callback, associated LSN 0/1688848
2020-01-18 07:06:57.041 UTC [1] LOG:  background worker "logical replication worker" (PID 205) exited with exit code 1

So.....I did something dumb. I dropped the subscription with:

drop subscription test_sub;
NOTICE:  dropped replication slot "test_sub" on publisher
DROP SUBSCRIPTION

Careful readers will see that that dropped the remote slot that I needed to keep around to have my queued changes from going away. If this happened live I'd be very sad, but that's why we run tests.

So...now I have to start over a bit. Luckily I have lots of notes.

@mlissner
Copy link
Member Author

Step two, try two, wherein I hope not to mess it up again

After re-establishing replication, clearing the data, and getting the watch command flowing smoothly again, I took another stab at this. As above, I disabled and altered the subscription on 5433 to turn it off while maintaining the slot.

Then I tried again to subscribe to the newly available slot using this command:

CREATE SUBSCRIPTION test_sub CONNECTION 'host=172.17.0.5 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub with (create_slot=false, slot_name=test_sub);
CREATE SUBSCRIPTION

That almost worked, but I got one lingering problem: It did a full COPY of the data again.. When I stop INSERTing rows into 5432 via the watch command, and look at the number of rows our test table on 5432, I get 435 rows. On our middle server, 5433, I get 168. On our terminal server (5434), I get their sum, 603.

Bah. Let's try one more time for the folks in the back.

@mlissner
Copy link
Member Author

Step two, try three, wherein I feel pretty good about it?

The one missing bit that I didn't do when I created the subscription last time was to include:

copy_data = false

Arg. Let's try it one more time. I'll clear things out, set up the replication chain, set up watch again, and then try the following command:

CREATE SUBSCRIPTION test_sub CONNECTION 'host=172.17.0.5 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub with (create_slot=false, slot_name=test_sub, copy_data=false);

That almost worked! But for some reason I don't have the right number of items. In 5432, I have one fewer item than in 5434 after all is said and done. That means that somehow an item got added twice. I don't know what to make of that. I'll have to think about it more carefully tomorrow.

@mlissner
Copy link
Member Author

I need to repeat this test with a primary key on the table.

@mlissner
Copy link
Member Author

mlissner commented Jun 29, 2020

Another todo:

  • Move DB backups to other server

@mlissner
Copy link
Member Author

Using #1115 (comment) I was able to recreate three postgresql servers in a chain.

The only tweaks needed were:

  1. Starting the container now requires a password:

     sudo docker run -p 5432 -e POSTGRES_PASSWORD=asdf postgres -p 5432 -c wal_level=logical
    
  2. creating the table was done with a primary key:

     create table test_table(id serial primary key, test_col varchar(10));
    

Onwards to the test.

@mlissner
Copy link
Member Author

mlissner commented Jun 30, 2020

Ugh, so the newer postgresql image that I'm running lacks the watch command and doesn't know how to install it. Rather than deal with that delight (dang it, docker!), this command works for adding data at a high frequency:

watch -n 0.25 docker exec -it b54  'psql -U postgres -d test -c "insert into test_table (test_col) values ('\''It works'\'');"'

Once that's running, I went into the middle server, and dropped the subscription using:

BEGIN;
alter subscription test_sub set (slot_name = NONE);
alter subscription test_sub set (slot_name = NONE);
COMMIT;

Nothing is flowing at this point. My next step was to go to the terminal server (5344) and subscribe it to the initial server (5432), with:

CREATE SUBSCRIPTION test_sub2 CONNECTION 'host=172.17.0.7 port=5432 user=user1 password=asdf dbname=test' PUBLICATION test_pub with (create_slot=false, slot_name=test_sub, copy_data=false);

With that done, the missing changes synced and I confirmed with a count:

select count(*) from test_table;

Finally, I nuked the old subscription on the terminal server (this was just cleanup):

select * from pg_subscription;
alter subscription test_sub disable;
alter subscription test_sub set (slot_name = NONE);
drop subscription test_sub;

And finally, I double checked that everything was working properly still. It was. This was a successful test.

@mlissner
Copy link
Member Author

mlissner commented Jul 1, 2020

The only missing piece here is to think about when to switch the ports over. I think the answer is to do it after nuking the subscription from the middle to terminal servers.

The live servers can be called root, middle and terminal. These are the command we'll run to do this live:

Before starting

On the root publisher, get the view count of a docket:

select view_count from search_docket where id=4214664;

Then check on the terminal server with the same command.

Reload the page here:

https://www.courtlistener.com/docket/4214664/national-veterans-legal-services-program-v-united-states/

Check each again.

On the middle server:

  1. Login with:

     sudo -u postgres psql -U postgres --dbname courtlistener
    
  2. Kill the subscription to the root server, leaving the slot behind:

    BEGIN;
    alter subscription replica disable;
    alter subscription replica set (slot_name = NONE);
    COMMIT;
  3. Check that the changes cached on the middle server flush to the terminal server:

    SELECT 
    slot_name,
    confirmed_flush_lsn, 
    pg_current_wal_lsn(), 
    (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
    FROM pg_replication_slots;
    

On our firewall

Swap the port over. Port 5432 (and 5433 if applicable) maps to new server instead of old.

On the terminal server:

  1. Subscribe to the root server using the middle server's old slot (substitute password before copy/pasting):

    CREATE SUBSCRIPTION replicasubscription_new CONNECTION 'host=courtlistener.com port=5433 user=replica password=asdf dbname=courtlistener' PUBLICATION test_pub with (create_slot=false, slot_name=replica, copy_data=false);
    
  2. Check that changes are coming through, as above, by hitting the URL and watching the view_count field.

  3. If good, remove old subscription (cleanup):

    select * from pg_publication;
    alter subscription test_sub disable; # This might not work
    alter subscription test_sub set (slot_name = NONE);
    
  4. Check changes again.

Done.

@mlissner
Copy link
Member Author

mlissner commented Jul 1, 2020

We've got the router part set up with a mapping of port 5433 to 5432, so that's in a good place. Brian also suggested that I do a second chain to a second terminal AWS server and test this there, so using the instructions in #932, I set that up. The first copy is going now. Once it completes, I'll be able to do a trial run of this that's even more realistic than the docker-based one.

@mlissner
Copy link
Member Author

mlissner commented Jul 2, 2020

And, my plan to do a trial run isn't a good one because the topology isn't the same. It's like this:

root --> middle --> AWS1
                `--> AWS2

Trying to use that as a test case would mean severing the tie between root and middle, which we can't do.

Ultimately, I've read a lot of documentation and I've done a few trials, but I just don't have the confidence I need to do this properly. I need deeper expertise than I can get from documentation, so I'm going to go see if we can get support from 2nd Quadrant.

@mlissner
Copy link
Member Author

mlissner commented Jul 8, 2020

On the advice of somebody at pgexperts.com,* the safer approach here is to:

  • Create new publication on root
  • Stop writes to root:
    • disable and kill all cron jobs
    • kill celery
    • stop apache
  • Wait for changes to flush to aws
  • Disable our subscription from middle to root
  • Create a new subscription from AWS to root's new publication, without making a copy (copy_data=false)
  • Start writes on root
  • Drop all subscriptions on middle (they're already disabled)
  • Disable and drop old subscription on AWS to middle
  • Kill old publication on root
  • Kill old publication on middle
  • Check that no slots are left laying around

* 2nd Quadrant was really helpful, but wasn't able to really engage without a larger contract, and they put us in touch with pgexperts.

@mlissner
Copy link
Member Author

mlissner commented Jul 9, 2020

Steps taken:

  1. On root, Create new publication:

     create publication replica2 for all tables;
    

    Verify that it's aligned with our old publication:

     select * from pg_publication;
    
  2. Ensure that remote access to root is possible from AWS by copying relevant parts of pg_hba.conf on middle to root, and run SELECT pg_reload_conf().

  3. Stop writes to root:

    • disable and kill all cron jobs
    • kill celery
    • stop supervisor stuff
    • stop apache
    • stop restic backups
  4. Wait for changes to flush. On root, wait for lsn_distance to reach zero, then do same on middle:

    SELECT 
      slot_name,
      confirmed_flush_lsn, 
      pg_current_wal_lsn(), 
      (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
    FROM pg_replication_slots;
    

    This represents how big the data is on root and middle that needs to flush.

  5. Disable our subscription from middle to root. On middle:

     alter subscription replica disable;
    
  6. Subscribe the aws server directly to root. On aws:

     CREATE SUBSCRIPTION replicasubscription2 CONNECTION 'host=courtlistener.com port=5433 user=replica password=xxx dbname=courtlistener' PUBLICATION replica2 with (copy_data=false);
    
  7. Do a sample write on root:

     \c courtlistener
     select view_count from search_docket where id = 4214664; 
     update search_docket set view_count = view_count + 1 where id = 4214664;
     \c postgres
    

    And check that the write makes it to aws, by checking the value there:

     select view_count from search_docket where id = 4214664; 
    

If good...

  1. Resume writes on root!

    • Restart apache
    • Restart celery
    • Restart cronjobs
    • Restart supervisord
  2. Proceed to clean up

If bad...

  1. Disable new subscription on aws:

     alter subscription replicasubscription2 disable;
    
  2. Re-enable subscription from middle to root. On middle:

     alter subscription replica enable;
    
  3. Regroup. Remember that the slot on root remains for the disabled subscription from aws to root.

Cleanup

  1. Done: Drop all subscriptions on middle (they're already disabled):

     drop subscription replica;
    
  2. Done: Disable and drop old subscription on AWS to middle. On aws:

     drop subscription replicasubscription;
    
  3. Done: Disable and drop old subscription on our test aws instance. On test-aws:

     drop subscription replica_test;
     drop subscription replcasubscription_new;
    
  4. Done: Kill old publication on root:

     drop publication replica;
     drop publication replica_test;
    
  5. Done: Kill old publications on middle:

     drop publication replica_test;
     drop publication replicapublication;
     drop publication new_server;
    
  6. Done. Check status of slots on root:

    SELECT 
      slot_name,
      confirmed_flush_lsn, 
      pg_current_wal_lsn(), 
      (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
    FROM pg_replication_slots;
    
  7. Done: Nuke RDS test instance (follow guidance in Tear down replication for client #1072)

Later

  1. Transferred to Set up postgresql backup on new server #1335: Move backup routine to root.
  2. Done: Turn off postgresql on middle.
  3. Done: Disable unused scraper init script???

Lessons learned:

  1. When I first subscribed aws to root, I was getting errors that:

    2020-07-08 22:39:03.574 PDT [18497] replica@courtlistener LOG:  logical decoding found consistent point at CC5/1C5F128
    2020-07-08 22:39:03.574 PDT [18497] replica@courtlistener DETAIL:  There are no running transactions.
    2020-07-08 22:39:03.575 PDT [18497] replica@courtlistener ERROR:  publication "replica2" does not exist
    2020-07-08 22:39:03.575 PDT [18497] replica@courtlistener CONTEXT:  slot "replicasubscription2", output plugin "pgoutput", in the change callback, associated LSN CC5/1C5F200
    

    I'm not entirely sure what caused this, but I think it was because I created the publication on the postgres DB instead of the courtlistener DB. I was able to fix it by dropping the replica2 publications on root, and then recreating them after \c courtlistener.

  2. Watching slot lag is really the best thing. It finds errors faster than watching the logs, which can be ambiguous.

  3. To wit, somehow a cache key wound up in aws that we were trying to push from root. Since we can't create duplicate rows in that table, it crashed out. The fix was to run this on aws:

     courtlistener=> select * from django_cache where cache_key = ':1:citing:2915131';
     courtlistener=> delete from django_cache where cache_key = ':1:citing:2915131';
    

@mlissner
Copy link
Member Author

mlissner commented Jul 9, 2020

This is finally done, after about six months and a lot of emails, stress, and work. I'm excited to do more routine work going forward.

@mlissner mlissner closed this as completed Jul 9, 2020
@mlissner mlissner self-assigned this Jul 9, 2020
@mlissner
Copy link
Member Author

And I dropped the DB from our old server, finally fixing the last piece of this problem:

04:13:17::mlissner@new-courtlistener::~ 
↪ df
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/vg1-lv1       886G  745G   97G  89% /
/dev/mapper/vg0-lv0       941M  120M  757M  14% /boot
/dev/mapper/vg7-lvsata    7.3T  6.0T  958G  87% /sata
/dev/mapper/vg8-lvbackup  7.3T  5.8T  1.2T  84% /home/rsnapshot/.snapshots
04:13:19::mlissner@new-courtlistener::~ 
↪ df
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/vg1-lv1       886G   40G  802G   5% /
/dev/mapper/vg0-lv0       941M  120M  757M  14% /boot
/dev/mapper/vg7-lvsata    7.3T  6.0T  958G  87% /sata
/dev/mapper/vg8-lvbackup  7.3T  5.8T  1.2T  84% /home/rsnapshot/.snapshots

Woo, it's good to see a drive go from 87% to 5% full.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant