Auth passthrough (auth_query) #266

magec · 2022-12-23T11:22:37Z

Auth passthrough (auth_query) feature

EDITED

Add auth passthough (auth_query)

Adds a feature that allows setting auth passthrough for md5 auth.

Configuration:

It adds 3 new (general and pool) config parameters:

auth_query: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder $1,
so pgcat can replace it with the user its trying to fetch the hash from.
auth_query_user: The user to use for connecting to the server and executing the
auth_query.
auth_query_password: The password to use for connecting to the server and executing the
auth_query.

The configuration can be done either on the general config (so pools share them) or in a per-pool basis.

Behavior

The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.

When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.

Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).

The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.

Testing

This is currently tested using ruby tests:

Considerations

A new crate is added to deal with postgres protocol parsing, I first did it myself and didn't like the result so thought, why reinventing the wheel?
~~Currently hash refresh is not done when reloading and no pool have changed~~.
No documentation is yet written, I want to first discuss about the change and will write later if we agree on the merge.
This fixes (Allow an 'auth_query' so passwords for pools can be loaded from database servers #253)

levkk · 2022-12-23T16:35:30Z

Cargo.toml

 hyper = { version = "0.14", features = ["full"] }
 phf = { version = "0.11.1", features = ["macros"] }
 exitcode = "1.1.2"
+postgres-protocol = "0.6.4"


This is a great find!

@magec I created this library a while ago with intention to create a PR to use this in PgCat. Wondering if you'd take a look, https://github.com/zain-kabani/postgres-proto-rs

I took this one because was the one used in postgres crate. Also, I found it pretty easy to use and is widely imused. Havent seen yours yet. Will take a look.

cjroth · 2022-12-24T05:10:33Z

Hey, I'm trying this on RDS (Aurora Serverless v2) and it is timing out while connecting whereas it works via the dockerized Postgres and RDS works fine on the main branch, so seems to be a regression from main.

magec · 2022-12-24T12:01:55Z

Uhmm, weird, could you provide soke logs in debug mode?

levkk · 2022-12-24T18:17:34Z

After a bit of thought, I'm surprised this works on RDS at all because of pgbouncer/pgbouncer#265

RDS blocks pg_shadow from being read, especially anything to do with hashed passwords, because rdsadmin, the RDS manager superuser, is also stored there.

cjroth · 2022-12-24T20:06:49Z

After a bit of thought, I'm surprised this works on RDS at all because of pgbouncer/pgbouncer#265

RDS blocks pg_shadow from being read, especially anything to do with hashed passwords, because rdsadmin, the RDS manager superuser, is also stored there.

I set the auth query to select 'username', 'hash'; as an experiment but I think you're right - it probably won't work with RDS for that reason.

The important thing for this PR is that I was having the connection problem even with the auth_user / auth_query turned off.

Logs from debug mode:

pgcat_1  | [2022-12-24T20:01:39.717879Z INFO  pgcat::pool] [pool: postgres][user: postgres] creating new pool
pgcat_1  | [2022-12-24T20:01:39.718132Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:40.322310Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:41.324413Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:43.099432Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:46.476293Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:53.061504Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:02:06.043084Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:02:29.720906Z ERROR pgcat::pool] Shard 0 down or misconfigured: TimedOut
pgcat_1  | [2022-12-24T20:02:29.721178Z ERROR pgcat::pool] Could not validate connection pool: AllServersDown
pgcat_1  | [2022-12-24T20:02:29.722107Z ERROR pgcat] Pool error: AllServersDown

pgcat.toml:

[general]
host = "0.0.0.0"
port = 6432
enable_prometheus_exporter = true
prometheus_exporter_port = 9930
connect_timeout = 50000
healthcheck_timeout = 100000
healthcheck_delay = 30000
shutdown_timeout = 60000
ban_time = 60
log_client_connections = false
log_client_disconnections = false
autoreload = false
admin_username = "postgres"
admin_password = "postgres"

[pools.postgres]
pool_mode = "transaction"
default_role = "any"
query_parser_enabled = true
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"
pool_size = 9
statement_timeout = 10000

[pools.postgres.users.0]
username = "postgres"
password = "********"
pool_size = 9
statement_timeout = 0

[pools.postgres.shards.0]
servers = [
    [ "*********", 5432, "primary" ]
]
database = "postgres"

magec · 2023-01-09T10:44:32Z

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

I also checked the code that decides whether to use auth_query or not and it seems to be ok, it ignores the feature altogether when nothing is specified, more specifically (https://github.com/OneSignal/pgcat/blob/auth-query/src/auth_passthrough.rs#L42), if this condition is not met.

        if config.general.auth_query_password.is_some()
            && config.general.auth_query_user.is_some()
            && config.general.auth_query_password.is_some()
            && config.general.auth_query_database.is_some()

My log goes like:

[2023-01-09T10:41:47.420578Z INFO  pgcat::config] [pool: postgres][user: postgres] Statement timeout: 0
[2023-01-09T10:41:47.420904Z INFO  pgcat::pool] [pool: postgres][user: postgres] creating new pool
[2023-01-09T10:41:47.421772Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "localhost", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
[2023-01-09T10:41:47.427795Z DEBUG pgcat::server] Server connection marked for clean up
[2023-01-09T10:41:47.428161Z INFO  pgcat] Config autoreloader: false
[2023-01-09T10:41:47.428256Z INFO  pgcat::stats] Events reporter started
[2023-01-09T10:41:47.428319Z INFO  pgcat] Waiting for clients

Any ideas what could be different? I used a pgcat compiled using this branch.

magec · 2023-01-10T15:25:51Z

I have added some commits to implement config reload.

I also detected an issue regarding reload which I think was a bug. The commit is this one.

Before, when configuration was reloaded, we checked whether a given pool
configuration was changed by adding it to a HashSet and checking whether it was
inserted or ignored. This behavior has a undesired effect when changing existing
pools, as the new one will be added, even though the old will still exist
in the HashSet.
With this change, the HashSet is always recreated with current configuration, leaving
intact the rest of the logic.

/cc @levkk

levkk · 2023-01-10T17:27:49Z

src/client.rs

+                }
+            }
+            if password_hash.is_none() {
+                warn!("Clien auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);


Suggested change

warn!("Clien auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);

warn!("Client auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);

levkk · 2023-01-10T17:28:37Z

src/config.rs

    }
+
+    pub fn overwrite_passwords(&mut self) {
+        if let Ok(val) = env::var("PGCAT_ADMIN_PASSWORD") {


We don't allow env-based configuration at the moment, only file based.

I added it because it goes aside with the auth_query thing. One of the things that we get from the auth_query is being able to stop using cleartext passwords on the config. To really get that done, we need a way of setting these two passwords so I added a simple mechanism for overwriting them using ENV vars.

That said, the actual reason (I have) is being able to deploy to kubernetes without losing the auto reload feature, which I find neat.

Normally, secrets are set as ENV vars in k8s. If I have no way of setting that password through and ENV var, I should create an script that generates it at boot time. The script should, provided a config file with a placeholder for those passwords, substitute those using the env vars, and write to a file that will be fed to pgcat.

The problem is that then (apart from this nighmare), I lose the auto reload feature, well not really loose it, but it would make me deploy the thing using a sidecar that does the password rewrite every time the templated config changes, so later pgcat sees it and does the actual reload.

I find this a bit cumbersome because it complicates containerized deploys for pgcat (the same would apply to different orchestrators) so I was hoping that these two variables could be an exception.

src/config.rs

src/pool.rs

levkk · 2023-01-10T17:31:50Z

src/pool.rs

                            replica_number += 1;
                        }

+                        // We assume every server in the pool share user/passwords


They don't :)

What is assumed here, is that every server in the connection pool (a connection pool being 1 user, 1 database n servers) shares the same user/password, and this is an assumption that is already being made in current implementation (with cleartext), also this assumption is logical, because you are in the end connecting to the pool using just one "user/database/password". The only thing that I think we can do is to tell that in logs so the administrator can spot the issue in the server and update its config, which is already done here

levkk · 2023-01-10T17:32:34Z

Left a few comments. We're on the right track.

cjroth · 2023-01-10T21:27:23Z

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

Hey so this is only happening on RDS actually - it wasn't a problem for me on localhost either - but I think it's worth investigating specifically on RDS since its such a common database provider and this PR definitely seems to break support for it where it works in the main branch.

magec · 2023-01-11T11:24:35Z

I have made the changes so now auth_query is configured in a 'per-pool' basis.

magec · 2023-03-06T10:12:30Z

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

Hey so this is only happening on RDS actually - it wasn't a problem for me on localhost either - but I think it's worth investigating specifically on RDS since its such a common database provider and this PR definitely seems to break support for it where it works in the main branch.

There was an issue with scram auth (it was broken), I understand that RDS uses it by default and that's why it failed. This is already solved.

magec · 2023-03-06T10:16:44Z

Ok, so @levkk I did some changes to this PR in the hope to make it to main.

No ENV vars are used anymore, we need to set the auth_query user password in cleartext in the config file (will come to this later).
Tests are now done in ruby (way clearer).
Fixed an issue that broke scram, now, given that tests also use scram, this is tested.
Now, instead of writing some complex reload config logic, we retry auth when there are issues that could be fixed, see (client.rs) for more info.

No documentation is written yet, waiting for validation.

levkk · 2023-03-09T17:34:25Z

@magec Added you as collaborator on the repo, welcome! I'll be able to take a deeper look at the PR next week, a little busy currently. Please feel free to get it reviewed by @drdrsh or just go ahead and merge as long as you're sure it works correctly. Thanks!

magec · 2023-03-09T17:56:13Z

Thanks!, I think I prefer a review, will come to this again next week.

magec · 2023-03-13T11:05:30Z

Ok, I added some documentation on the newly created CONFIG.md file.

src/config.rs

This adds a new `exec_simple_query` method so we can make 'out of band' queries to servers that don't interfere with pools at all. In order to reuse startup code for making these simple queries, we need to set the stats (`Reporter`) optional, so using these simple queries wont interfere with stats.

Adds a feature that allows setting auth passthrough for md5 auth. It adds 3 new (general and pool) config parameters: - `auth_query`: An string containing a query that will be executed on boot to obtain the hash of a given user. This query have to use a placeholder `$1`, so pgcat can replace it with the user its trying to fetch the hash from. - `auth_query_user`: The user to use for connecting to the server and executing the auth_query. - `auth_query_password`: The password to use for connecting to the server and executing the auth_query. The configuration can be done either on the general config (so pools share them) or in a per-pool basis. The behavior is, at boot time, when validating server connections, a hash is fetched per server and stored in the pool. When new server connections are created, and no cleartext password is specified, the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries it. When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no hash (we could not obtain one when validating the connection), a new fetch is tried. Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure we refetch the hash and retry auth (so password changes can be done). The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be obtained during connection validation, or the password has change, we can still connect later.

README.md

magec · 2023-03-30T19:53:23Z

Is this ready to merge? I have been using the feature in prod for some time now without issues at all, also, in the end is an opt-in disabled by default.

levkk · 2023-03-30T20:25:12Z

Sounds good to me! Can we sync real quick regarding #389. I'm introducing multiple ways to authenticate, and maybe we can rebase your PR on that so we can have clean separation between various authentication mechanisms? I can rebase it myself, if that's ok with you.

levkk

Actually it's pretty straight forward to rebase my changes on yours once yours is in. Good to go!

magec force-pushed the auth-query branch 2 times, most recently from 8c2544b to d0e3081 Compare December 23, 2022 11:34

levkk reviewed Dec 23, 2022

View reviewed changes

magec force-pushed the auth-query branch from c4d6895 to 8585330 Compare January 10, 2023 15:37

levkk reviewed Jan 10, 2023

View reviewed changes

src/config.rs Outdated Show resolved Hide resolved

levkk reviewed Jan 10, 2023

View reviewed changes

src/pool.rs Outdated Show resolved Hide resolved

levkk reviewed Jan 10, 2023

View reviewed changes

magec force-pushed the auth-query branch 2 times, most recently from 3d6dddf to 99dfaa3 Compare January 11, 2023 11:14

magec force-pushed the auth-query branch 3 times, most recently from 854ff93 to 870b34d Compare January 18, 2023 12:35

magec force-pushed the auth-query branch from 870b34d to 81bd01e Compare January 23, 2023 13:11

magec force-pushed the auth-query branch from 81bd01e to 74ef9b6 Compare February 2, 2023 10:26

magec force-pushed the auth-query branch 2 times, most recently from 92255ad to e78b618 Compare February 23, 2023 10:08

magec force-pushed the auth-query branch 3 times, most recently from 6381136 to 7a55af7 Compare February 28, 2023 10:39

magec force-pushed the auth-query branch 7 times, most recently from af2d2e9 to 8c768f1 Compare March 6, 2023 10:05

magec force-pushed the auth-query branch from 8c768f1 to a40a81b Compare March 6, 2023 10:16

magec changed the title ~~Auth query~~ Auth passthrough (auth_query) Mar 6, 2023

magec force-pushed the auth-query branch from a40a81b to 8471343 Compare March 6, 2023 10:44

magec requested a review from levkk March 7, 2023 16:56

magec force-pushed the auth-query branch 2 times, most recently from 50ae46b to 1d9fdc9 Compare March 13, 2023 09:01

levkk reviewed Mar 15, 2023

View reviewed changes

src/config.rs Show resolved Hide resolved

magec force-pushed the auth-query branch from 1d9fdc9 to b9f887b Compare March 20, 2023 13:41

magec added 2 commits March 28, 2023 17:23

magec force-pushed the auth-query branch from b9f887b to 00013fc Compare March 28, 2023 15:37

levkk reviewed Mar 28, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

Add documentation for Auth passthrough

be88dc1

magec force-pushed the auth-query branch from 00013fc to be88dc1 Compare March 29, 2023 08:05

levkk approved these changes Mar 30, 2023

View reviewed changes

levkk merged commit 6f768a8 into postgresml:main Mar 30, 2023

	warn!("Clien auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);
	warn!("Client auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);

Auth passthrough (auth_query) #266

Auth passthrough (auth_query) #266

Uh oh!

Conversation

magec commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auth passthrough (auth_query) feature

Configuration:

Behavior

Testing

Considerations

Uh oh!

levkk Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

zainkabani Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

magec Mar 30, 2023

Choose a reason for hiding this comment

Uh oh!

cjroth commented Dec 24, 2022

Uh oh!

magec commented Dec 24, 2022

Uh oh!

levkk commented Dec 24, 2022

Uh oh!

cjroth commented Dec 24, 2022

Uh oh!

magec commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

magec commented Jan 10, 2023

Uh oh!

levkk Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

levkk Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

magec Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

levkk Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

magec Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levkk commented Jan 10, 2023

Uh oh!

cjroth commented Jan 10, 2023

Uh oh!

magec commented Jan 11, 2023

Uh oh!

magec commented Mar 6, 2023

Uh oh!

magec commented Mar 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levkk commented Mar 9, 2023

Uh oh!

magec commented Mar 9, 2023

Uh oh!

magec commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

magec commented Mar 30, 2023

Uh oh!

levkk commented Mar 30, 2023

Uh oh!

levkk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

magec commented Dec 23, 2022 •

edited

Loading

magec commented Jan 9, 2023 •

edited

Loading

magec Jan 10, 2023 •

edited

Loading

magec commented Mar 6, 2023 •

edited

Loading