Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db settings: reload on connection recovery, only get them from global/current-database scope and change dash to underscore #1760

Merged
merged 4 commits into from
Mar 6, 2021

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Feb 18, 2021

Corrections of db settings discussed on: #1729 (comment)

  • Change pgrst.xxx-yyy variables to use _ instead of -.
  • Ignore settings from other databases (ALTER ROLE ... IN DATABASE B SET ... should not affect postgrest in database A
  • Merge database specific and global settings consistently (ALTER ROLE ... IN DATABASE ... SET should override ALTER ROLE ... SET)
  • Fix config reloading after reconnect

src/PostgREST/Statements.hs Outdated Show resolved Hide resolved
test/fixtures/roles.sql Outdated Show resolved Hide resolved
test/fixtures/roles.sql Outdated Show resolved Hide resolved
test/fixtures/roles.sql Outdated Show resolved Hide resolved
@wolfgangwalther
Copy link
Member

It would be really nice, if we could add a test for reload recovery. This would go a long way in terms of test coverage. In general, I think we're not testing any of the "recover from an error" situations, yet.

However, I can't come up with a way to do this with our current io test setup. with_tmp_db makes sure that the database is ready to go, before the tests are run - and I don't know how we could force a failed connection at first, that will succeed later, without sending any reload signal to postgrest (sending a signal doesn't make sense, because it wouldn't test self recovery). @monacoremo any idea?

@steve-chavez
Copy link
Member Author

Fix config reloading after reconnect

I think this is more or less fixed and I'm in the process of adding python tests. However these tests will fail with the latest commit:

@pytest.mark.parametrize(
"secretpath",
[path for path in (BASEDIR / "secrets").iterdir() if path.suffix != ".jwt"],
ids=attrgetter("name"),
)
def test_read_secret_from_file(secretpath, defaultenv):
"Authorization should succeed when the secret is read from a file."

This is because the config file for the tests uses /dev/stdin:

# Read secret from a file: /dev/stdin (alias for standard input)
jwt-secret = "@/dev/stdin"

And since the config file reload is now done in a subthread, stdin will not reach it. Because AFAICT, stdin only reaches the parent process: https://stackoverflow.com/questions/55602283/how-to-write-data-to-stdin-to-be-consumed-by-a-separate-thread-waiting-on-input

Seems the workaround is to create a named pipe somehow for stdin to work, but it looks too complicated to fix this in postgrest(also not sure if possible).

How about if I just change the tests for not using stdin?

@wolfgangwalther Thoughts?

@wolfgangwalther
Copy link
Member

How about if I just change the tests for not using stdin?

@wolfgangwalther Thoughts?

Was "reading the secret from stdin" just a hack to test "reading the secret from file"? So we don't actually support/advertise reading from stdin here? In that case I agree: Just use a proper temp file created with python (have a look at some of the other tests, I think there is a nice pytest-way of doing it).

@steve-chavez
Copy link
Member Author

Just use a proper temp file created with python

Cool. I'll do that.

However, I can't come up with a way to do this with our current io test setup.

I've managed to create a couple of utilitarian functions that seem to be working.

def pg_stop():
    "Stop postgresql"
    subprocess.run(["pg_ctl", "stop", "-m", "i"])
    time.sleep(1)

def pg_start():
    "Start postgresql"
    PGHOST = os.getenv("PGHOST")
    subprocess.run(["pg_ctl", "start", "-o", f'-F -c listen_addresses="" -k {PGHOST}'])

I'm not sure if they'll hold up when writing the actual tests though. I'm all ears for a better way to do this.

@wolfgangwalther
Copy link
Member

However, I can't come up with a way to do this with our current io test setup.

I've managed to create a couple of utilitarian functions that seem to be working.

def pg_stop():
    "Stop postgresql"
    subprocess.run(["pg_ctl", "stop", "-m", "i"])
    time.sleep(1)

def pg_start():
    "Start postgresql"
    PGHOST = os.getenv("PGHOST")
    subprocess.run(["pg_ctl", "start", "-o", f'-F -c listen_addresses="" -k {PGHOST}'])

I'm not sure if they'll hold up when writing the actual tests though. I'm all ears for a better way to do this.

I think the problem with that approach is, that it will break running the io tests in parallel. We're using the same database instance for all tests, so those other tests will have a stopped postgres, too.

Maybe we can add some kind of a postgres proxy - pgbouncer or something? I am not familiar with that, but that could be a base to test support for that in general, too.

If we keep postgres running, but only start the proxy/pgbouncer after we started postgrest.. that could work. WDYT?

@steve-chavez
Copy link
Member Author

So we don't actually support/advertise reading from stdin here?

Not that I'm aware of. Seems this was inherited from previous bash tests:

<<< "Y29ubmVjdGlvbl9zdHJpbmc=" # /dev/stdin is read by some config files, one of them expects Base64

Though it does seem to be a legitimate use case: kubernetes/kubernetes#54200 (comment)

(Personally I've never done it)

@steve-chavez
Copy link
Member Author

If we keep postgres running, but only start the proxy/pgbouncer after we started postgrest.. that could work. WDYT?

It does seem interesting. But if the problem is the parallelism of the io tests, perhaps I'd be easier to separate the recovery tests(in a recovery-tests.py) and force them to run sequentially. Can that be done with pytest?

(I'm definitely up for including pgbouncer later though)

@monacoremo
Copy link
Member

monacoremo commented Feb 23, 2021

It should be possible to nest with_tmp_db instances with no issues. So what we could do is:

  • Add an option like withtmpdb=False to run
  • If withtmpdb is true, we use ["test/with_tmp_db", "postgrest"] as the command

Then it should be possible to stop just that db without affecting the parent db.

Edit: simpler solution

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 24, 2021

It should be possible to nest with_tmp_db instances with no issues. So what we could do is:

* Add an option like `withtmpdb=False` to `run`

* If `withtmpdb` is true, we use `["test/with_tmp_db", "postgrest"]` as the command

Then it should be possible to stop just that db without affecting the parent db.

Edit: simpler solution

I don't think this will work like that. Now we have 2 postgres instances running - which instance will pg_ctl start and stop? Probably depends on environment variables and since the function is called from the io tests, it will still start and stop the global instance - not the with_tmp_db that runs together with postgrest.

@wolfgangwalther
Copy link
Member

Maybe instead of running a full pgbouncer as a proxy, we can just set up a port forwarding via netcat (https://stackoverflow.com/a/2150342).

So in the test, we'd:

  • take note of the unix socket that with_tmp_db provides
  • start postgrest with some localhost:xxx db-uri setting instead of the socket
  • after the first connection failed, we open the port forwarding from localhost:xxx to the unix socket from with_tmp_db via netcat
  • the connection will now succeed

No need to fiddle with restarting postgres etc. either.

@steve-chavez
Copy link
Member Author

Was "reading the secret from stdin" just a hack to test "reading the secret from file"? So we don't actually support/advertise reading from stdin here?

I didn't want to break any use cases(perhaps UNIX-savy users use stdin), so I've just fixed the issue. No test was changed.

The main problem was that the files(db-uri, jwt-secret) were read several times and reading /dev/stdin makes the thread wait for input, so our tests would have needed repeating the same input several times. It was also weird to see this behavior when running postrest manually.

What I've changed is that external config files are now only read one time. Also untangled the logic a bit, I think the code is also clearer this way.

If withtmpdb is true, we use ["test/with_tmp_db", "postgrest"] as the command
Maybe instead of running a full pgbouncer as a proxy, we can just set up a port forwarding via netcat

Hm, could you guys help me with the recovery tests on another PR? @monacoremo @wolfgangwalther. I can put a list of the manual tests I make for ensuring the recovery is working.

Still have to do some changes here(like verify SIGUSR2 reloads the external config files) and it would take me a while to figure out how to change the setup for the recovery tests.

@wolfgangwalther
Copy link
Member

Was "reading the secret from stdin" just a hack to test "reading the secret from file"? So we don't actually support/advertise reading from stdin here?

I didn't want to break any use cases(perhaps UNIX-savy users use stdin), so I've just fixed the issue. No test was changed.

The main problem was that the files(db-uri, jwt-secret) were read several times and reading /dev/stdin makes the thread wait for input, so our tests would have needed repeating the same input several times. It was also weird to see this behavior when running postrest manually.

What I've changed is that external config files are now only read one time. Also untangled the logic a bit, I think the code is also clearer this way.

I didn't go through the latest commit, yet, but is my understanding correct as follows?

  • config file (and other files) are read once on startup and whenever a config re-read is triggered manually (signal / notify)
  • env+file config (without database) is cached
  • database config is re-read whenever a connection is (re-)established - same as the schema cache (?)
  • database config is applied onto the cached env+file-config
  • the files are read in the main thread, but the database config in each worker (?)

If so, that's a lot better, I agree!

Hm, could you guys help me with the recovery tests on another PR? @monacoremo @wolfgangwalther. I can put a list of the manual tests I make for ensuring the recovery is working.

Absolutely. A list of all your manual tests to reproduce would be great!

@steve-chavez
Copy link
Member Author

config file (and other files) are read once on startup and whenever a config re-read is triggered manually (signal / notify)
database config is re-read whenever a connection is (re-)established - same as the schema cache (?)

Yes, the external files will be read only when manually requesting so, with SIGUSR2 or NOTIFY. They'll not be re-read when recovering, here only the db config will be re-read when the db-load-guc-config is true.

Testing with @/dev/stdin led me to this decision, because when re-reading on recovery, the thread got locked waiting for stdin input to fill the config and the schema cache was not reloaded until input was given(note that re-reading config has to happen before schema cache loading, because of db-schemas).

Only reading external files when requested avoids that issue. Also seemed more in line with the connection worker behavior.

the files are read in the main thread, but the database config in each worker (?)

No, the db config is also read on the main thread. This is needed because of the dump-config tests - reading the db config has to happen before the connection worker or it pollutes the output. So the db config is in fact read twice on startup: main thread + connection worker. Ideally it would be read only once too, but I couldn't figure out how to make that work with CmdDumpConfig cleanly.

env+file config (without database) is cached
database config is applied onto the cached env+file-config

Yes, correct.

Absolutely. A list of all your manual tests to reproduce would be great!

Cool :)!. I'll put the list here when I'm done with some changes.

@steve-chavez steve-chavez force-pushed the fix-guc-settings branch 2 times, most recently from a36f949 to 6c5f31b Compare February 27, 2021 03:05
@steve-chavez steve-chavez changed the title Correct db settings: only get them from global/current-database scope and change dash to underscore db settings: reload on connection recovery, only get them from global/current-database scope and change dash to underscore Feb 27, 2021
@@ -358,12 +358,12 @@ reReadConfig startingUp pool gucConfigEnabled env path refConf dbUriFile secretF
Left err ->
if startingUp
then panic err -- die on invalid config if the program is starting up
else hPutStrLn stderr $ "Failed config load. " <> err
else hPutStrLn stderr $ "Failed loading in-database config. " <> err
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term in-database config is much clearer.
Perhaps the db-load-guc-config option should be changed to config-in-db?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see your point about db-load-guc-config - it's clumsy. I'd like to keep the db- prefix, though.

What about db-has-config = true|false ? Or just db-config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has looks a bit weird inside a config option. I'll go with db-config, it's already much better than db-load-guc-config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfgangwalther What do you think of having db-config to be false by default

The main argument for it being true was to have the least amount of config options: #1729 (comment). But that isn't possible atm, because db-anon-role and db-schemas are required.

However I'm thinking that only defining db-uri could be done and still maintain the required fields restriction. This by having an DbUriAppConfig that is used previous to AppConfig. With that even non-reloadable config options(like db-pool) could be set with the db-config.

But anyway the above is still not done, so db-config is not an advantage now regarding few config options.

It's just that I feel db-config=false is right because most users will not use it(or won't be able, as mentioned on #1729 (comment)). It's a special use case. Having it by default gives potential for errors(because of the query done, twice) on all instances.

WDYT?

Copy link
Member Author

@steve-chavez steve-chavez Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db-anon-role and db-schemas are required.
This by having an DbUriAppConfig that is used previous to AppConfig

Actually, a better solution for this would be to:

  • make db-anon-role optional. Make db-anon-role optional #1689
  • make db-schemas default to public. This seems like a nice default that will always work. We can put a message on stdout to make it clear when the default is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db-anon-role and db-schemas are required.
This by having an DbUriAppConfig that is used previous to AppConfig

Actually, a better solution for this would be to:

* make `db-anon-role` optional. #1689

* make `db-schemas` default to `public`. This seems like a nice default that will always work. We can put a message on stdout to make it clear when the default is used.

Much, much better, yes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that I feel db-config=false is right because most users will not use it(or won't be able, as mentioned on #1729 (comment)). It's a special use case. Having it by default gives potential for errors(because of the query done, twice) on all instances.

WDYT?

I think the risk involved is very small. There is no overhead either - even though it's a separate statement, it's basically just part of reloading the schema cache. And in the big picture, loading config options from the database is just a tiny piece of reading the whole schema.

tbh: I actually wonder whether we really need the config option at all. It could just be enabled all the time. It's probably most useful for writing tests - but that's about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the big picture, loading config options from the database is just a tiny piece of reading the whole schema.

Ah, you're totally right here! I remember trying to do this at first, but I had some issues with the code.

I think it could be done now. We'd have to enable reloading only a part of the schema cache(the guc settings) for SIGUSR2/NOTIFY. Also make the --dump-config wait for the connectionWoker to finish(maybe an MVar) and only print the config contents to stdout(no Attempt to connect to db.. or tests fail).

tbh: I actually wonder whether we really need the config option at all. It could just be enabled all the time.

I think the same way now. If the guc settings are part of the schema cache, we can remove the config option.

@steve-chavez steve-chavez mentioned this pull request Mar 1, 2021
5 tasks
@steve-chavez
Copy link
Member Author

Deleted the previous comment. I've now opened another issue for the recovery tests #1766.

Global settings are overriden by database specific settings if they
share common ones.
* Separate reading files from whole config re-read
* Only reload external file on SIGUSR2/NOTIFY
Make clear that in-db config is being read
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants