Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth passthrough (auth_query) #266

Merged
merged 3 commits into from
Mar 30, 2023
Merged

Auth passthrough (auth_query) #266

merged 3 commits into from
Mar 30, 2023

Conversation

magec
Copy link
Collaborator

@magec magec commented Dec 23, 2022

Auth passthrough (auth_query) feature

EDITED

Add auth passthough (auth_query)

Adds a feature that allows setting auth passthrough for md5 auth.

Configuration:

It adds 3 new (general and pool) config parameters:

  • auth_query: An string containing a query that will be executed on boot
    to obtain the hash of a given user. This query have to use a placeholder $1,
    so pgcat can replace it with the user its trying to fetch the hash from.
  • auth_query_user: The user to use for connecting to the server and executing the
    auth_query.
  • auth_query_password: The password to use for connecting to the server and executing the
    auth_query.

The configuration can be done either on the general config (so pools share them) or in a per-pool basis.

Behavior

The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.

When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.

Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).

The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.

Testing

This is currently tested using ruby tests:

image

Considerations

  • A new crate is added to deal with postgres protocol parsing, I first did it myself and didn't like the result so thought, why reinventing the wheel?
  • Currently hash refresh is not done when reloading and no pool have changed.
  • No documentation is yet written, I want to first discuss about the change and will write later if we agree on the merge.
  • This fixes (Allow an 'auth_query' so passwords for pools can be loaded from database servers #253)

@magec magec force-pushed the auth-query branch 2 times, most recently from 8c2544b to d0e3081 Compare December 23, 2022 11:34
@@ -34,6 +34,8 @@ rustls-pemfile = "1"
hyper = { version = "0.14", features = ["full"] }
phf = { version = "0.11.1", features = ["macros"] }
exitcode = "1.1.2"
postgres-protocol = "0.6.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great find!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magec I created this library a while ago with intention to create a PR to use this in PgCat. Wondering if you'd take a look, https://github.com/zain-kabani/postgres-proto-rs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this one because was the one used in postgres crate. Also, I found it pretty easy to use and is widely imused. Havent seen yours yet. Will take a look.

@cjroth
Copy link

cjroth commented Dec 24, 2022

Hey, I'm trying this on RDS (Aurora Serverless v2) and it is timing out while connecting whereas it works via the dockerized Postgres and RDS works fine on the main branch, so seems to be a regression from main.

@magec
Copy link
Collaborator Author

magec commented Dec 24, 2022

Uhmm, weird, could you provide soke logs in debug mode?

@levkk
Copy link
Contributor

levkk commented Dec 24, 2022

After a bit of thought, I'm surprised this works on RDS at all because of pgbouncer/pgbouncer#265

RDS blocks pg_shadow from being read, especially anything to do with hashed passwords, because rdsadmin, the RDS manager superuser, is also stored there.

@cjroth
Copy link

cjroth commented Dec 24, 2022

After a bit of thought, I'm surprised this works on RDS at all because of pgbouncer/pgbouncer#265

RDS blocks pg_shadow from being read, especially anything to do with hashed passwords, because rdsadmin, the RDS manager superuser, is also stored there.

I set the auth query to select 'username', 'hash'; as an experiment but I think you're right - it probably won't work with RDS for that reason.

The important thing for this PR is that I was having the connection problem even with the auth_user / auth_query turned off.

Logs from debug mode:

pgcat_1  | [2022-12-24T20:01:39.717879Z INFO  pgcat::pool] [pool: postgres][user: postgres] creating new pool
pgcat_1  | [2022-12-24T20:01:39.718132Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:40.322310Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:41.324413Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:43.099432Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:46.476293Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:01:53.061504Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:02:06.043084Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "***********", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
pgcat_1  | [2022-12-24T20:02:29.720906Z ERROR pgcat::pool] Shard 0 down or misconfigured: TimedOut
pgcat_1  | [2022-12-24T20:02:29.721178Z ERROR pgcat::pool] Could not validate connection pool: AllServersDown
pgcat_1  | [2022-12-24T20:02:29.722107Z ERROR pgcat] Pool error: AllServersDown

pgcat.toml:

[general]
host = "0.0.0.0"
port = 6432
enable_prometheus_exporter = true
prometheus_exporter_port = 9930
connect_timeout = 50000
healthcheck_timeout = 100000
healthcheck_delay = 30000
shutdown_timeout = 60000
ban_time = 60
log_client_connections = false
log_client_disconnections = false
autoreload = false
admin_username = "postgres"
admin_password = "postgres"

[pools.postgres]
pool_mode = "transaction"
default_role = "any"
query_parser_enabled = true
primary_reads_enabled = true
sharding_function = "pg_bigint_hash"
pool_size = 9
statement_timeout = 10000

[pools.postgres.users.0]
username = "postgres"
password = "********"
pool_size = 9
statement_timeout = 0

[pools.postgres.shards.0]
servers = [
    [ "*********", 5432, "primary" ]
]
database = "postgres"

@magec
Copy link
Collaborator Author

magec commented Jan 9, 2023

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

I also checked the code that decides whether to use auth_query or not and it seems to be ok, it ignores the feature altogether when nothing is specified, more specifically (https://github.com/OneSignal/pgcat/blob/auth-query/src/auth_passthrough.rs#L42), if this condition is not met.

        if config.general.auth_query_password.is_some()
            && config.general.auth_query_user.is_some()
            && config.general.auth_query_password.is_some()
            && config.general.auth_query_database.is_some()

My log goes like:

[2023-01-09T10:41:47.420578Z INFO  pgcat::config] [pool: postgres][user: postgres] Statement timeout: 0
[2023-01-09T10:41:47.420904Z INFO  pgcat::pool] [pool: postgres][user: postgres] creating new pool
[2023-01-09T10:41:47.421772Z INFO  pgcat::pool] Creating a new server connection Address { id: 0, host: "localhost", port: 5432, shard: 0, database: "postgres", role: Primary, replica_number: 0, address_index: 0, username: "postgres", pool_name: "postgres" }
[2023-01-09T10:41:47.427795Z DEBUG pgcat::server] Server connection marked for clean up
[2023-01-09T10:41:47.428161Z INFO  pgcat] Config autoreloader: false
[2023-01-09T10:41:47.428256Z INFO  pgcat::stats] Events reporter started
[2023-01-09T10:41:47.428319Z INFO  pgcat] Waiting for clients

Any ideas what could be different? I used a pgcat compiled using this branch.

@magec
Copy link
Collaborator Author

magec commented Jan 10, 2023

I have added some commits to implement config reload.

I also detected an issue regarding reload which I think was a bug. The commit is this one.

Before, when configuration was reloaded, we checked whether a given pool
configuration was changed by adding it to a HashSet and checking whether it was
inserted or ignored. This behavior has a undesired effect when changing existing
pools, as the new one will be added, even though the old will still exist
in the HashSet.
With this change, the HashSet is always recreated with current configuration, leaving
intact the rest of the logic.

/cc @levkk

src/client.rs Outdated
}
}
if password_hash.is_none() {
warn!("Clien auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warn!("Clien auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);
warn!("Client auth is not possible, you either have not set a valid auth_query or a password for {{ username: {:?}, pool_name: {:?}, application_name: {:?} }}", username, pool_name, application_name);

src/config.rs Outdated
@@ -448,6 +458,16 @@ impl Config {
pub fn default_path() -> String {
String::from("pgcat.toml")
}

pub fn overwrite_passwords(&mut self) {
if let Ok(val) = env::var("PGCAT_ADMIN_PASSWORD") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't allow env-based configuration at the moment, only file based.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because it goes aside with the auth_query thing. One of the things that we get from the auth_query is being able to stop using cleartext passwords on the config. To really get that done, we need a way of setting these two passwords so I added a simple mechanism for overwriting them using ENV vars.

That said, the actual reason (I have) is being able to deploy to kubernetes without losing the auto reload feature, which I find neat.

Normally, secrets are set as ENV vars in k8s. If I have no way of setting that password through and ENV var, I should create an script that generates it at boot time. The script should, provided a config file with a placeholder for those passwords, substitute those using the env vars, and write to a file that will be fed to pgcat.

The problem is that then (apart from this nighmare), I lose the auto reload feature, well not really loose it, but it would make me deploy the thing using a sidecar that does the password rewrite every time the templated config changes, so later pgcat sees it and does the actual reload.

I find this a bit cumbersome because it complicates containerized deploys for pgcat (the same would apply to different orchestrators) so I was hoping that these two variables could be an exception.

src/config.rs Outdated Show resolved Hide resolved
src/pool.rs Outdated Show resolved Hide resolved
@@ -211,12 +242,31 @@ impl ConnectionPool {
replica_number += 1;
}

// We assume every server in the pool share user/passwords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't :)

Copy link
Collaborator Author

@magec magec Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is assumed here, is that every server in the connection pool (a connection pool being 1 user, 1 database n servers) shares the same user/password, and this is an assumption that is already being made in current implementation (with cleartext), also this assumption is logical, because you are in the end connecting to the pool using just one "user/database/password". The only thing that I think we can do is to tell that in logs so the administrator can spot the issue in the server and update its config, which is already done here

@levkk
Copy link
Contributor

levkk commented Jan 10, 2023

Left a few comments. We're on the right track.

@cjroth
Copy link

cjroth commented Jan 10, 2023

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

Hey so this is only happening on RDS actually - it wasn't a problem for me on localhost either - but I think it's worth investigating specifically on RDS since its such a common database provider and this PR definitely seems to break support for it where it works in the main branch.

@magec magec force-pushed the auth-query branch 2 times, most recently from 3d6dddf to 99dfaa3 Compare January 11, 2023 11:14
@magec
Copy link
Collaborator Author

magec commented Jan 11, 2023

I have made the changes so now auth_query is configured in a 'per-pool' basis.

@magec magec force-pushed the auth-query branch 7 times, most recently from af2d2e9 to 8c768f1 Compare March 6, 2023 10:05
@magec
Copy link
Collaborator Author

magec commented Mar 6, 2023

Hello!, sorry the late response. I could not reproduce the thing, using an instance in localhost with user/pass postgres/postgres and with the config you provided, it just works.

Hey so this is only happening on RDS actually - it wasn't a problem for me on localhost either - but I think it's worth investigating specifically on RDS since its such a common database provider and this PR definitely seems to break support for it where it works in the main branch.

There was an issue with scram auth (it was broken), I understand that RDS uses it by default and that's why it failed. This is already solved.

@magec
Copy link
Collaborator Author

magec commented Mar 6, 2023

Ok, so @levkk I did some changes to this PR in the hope to make it to main.

  • No ENV vars are used anymore, we need to set the auth_query user password in cleartext in the config file (will come to this later).
  • Tests are now done in ruby (way clearer).
  • Fixed an issue that broke scram, now, given that tests also use scram, this is tested.
  • Now, instead of writing some complex reload config logic, we retry auth when there are issues that could be fixed, see (client.rs) for more info.

No documentation is written yet, waiting for validation.

@magec magec changed the title Auth query Auth passthrough (auth_query) Mar 6, 2023
@magec magec requested a review from levkk March 7, 2023 16:56
@levkk
Copy link
Contributor

levkk commented Mar 9, 2023

@magec Added you as collaborator on the repo, welcome! I'll be able to take a deeper look at the PR next week, a little busy currently. Please feel free to get it reviewed by @drdrsh or just go ahead and merge as long as you're sure it works correctly. Thanks!

@magec
Copy link
Collaborator Author

magec commented Mar 9, 2023

Thanks!, I think I prefer a review, will come to this again next week.

@magec magec force-pushed the auth-query branch 2 times, most recently from 50ae46b to 1d9fdc9 Compare March 13, 2023 09:01
@magec
Copy link
Collaborator Author

magec commented Mar 13, 2023

Ok, I added some documentation on the newly created CONFIG.md file.

src/config.rs Show resolved Hide resolved
This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.
Adds a feature that allows setting auth passthrough for md5 auth.

It adds 3 new (general and pool) config parameters:

- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.

The configuration can be done either on the general config (so pools share them) or in a per-pool basis.

The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.

When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.

Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).

The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.
README.md Outdated Show resolved Hide resolved
@magec
Copy link
Collaborator Author

magec commented Mar 30, 2023

Is this ready to merge? I have been using the feature in prod for some time now without issues at all, also, in the end is an opt-in disabled by default.

@levkk
Copy link
Contributor

levkk commented Mar 30, 2023

Sounds good to me! Can we sync real quick regarding #389. I'm introducing multiple ways to authenticate, and maybe we can rebase your PR on that so we can have clean separation between various authentication mechanisms? I can rebase it myself, if that's ok with you.

Copy link
Contributor

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's pretty straight forward to rebase my changes on yours once yours is in. Good to go!

@levkk levkk merged commit 6f768a8 into postgresml:main Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants