Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres-extension #18

Open
jeff-davis opened this issue Jan 29, 2019 · 7 comments
Open

postgres-extension #18

jeff-davis opened this issue Jan 29, 2019 · 7 comments

Comments

@jeff-davis
Copy link

It seems that we began similar projects at similar times. I started on https://github.com/jeff-davis/postgres-extension.rs just recently and did not notice your project.

A few comments:

  • It might make sense to have the PostgreSQL License as one of the license options, not that it's much different from MIT
  • I'm a little confused about how the panic handler works or is supposed to work. It seems to turn a rust panic into a FATAL (when I tried it).
    • Is FATAL the right thing to do? Surely a misguided .unwrap() could just be an ERROR in many cases. If mucking with shared memory, perhaps a postgres PANIC is in order. Why did you choose FATAL?
    • How does it actually turn into a FATAL? The code looks like it's just doing a resume_panic, how is that supposed to land at the right hander inside of postgres? Don't you need to longjmp (directly or indirectly via pg_re_throw() or elog())?
  • I am trying to do a few things differently in my project which you might find interesting... for instance I am also trying to catch postgres errors and turn them into rust panics (important for SPI, etc.)
  • Great work on the macros! I see that you're using an attribute-like proc macro. That's great because it can inspect the parameter list and return type. Unfortunately, it also obfuscates error messages by always referencing the attribute that created the code with the error. Is there any way to avoid this so that the user can see informative error messages from their own code? I made mine a function-like proc macro, but it only works for the V1 calling convention.
@bluejekyll
Copy link
Owner

Cool that we thought of doing the same thing! I had a couple goals for this project, they done necessarily line up with yours, but maybe they do. It sounds like you know the innards of Postgres better than I do, thank you for the feedback. For context, I wrote this over winter break from work, so I can believe there are some issues.

It might be best to open separate issues for each of the topics in your post, but I’ll try and respond here and we’ll see how the discussion goes.

It might make sense to have the PostgreSQL License as one of the license options, not that it's much different from MIT

I don’t know the Postgres license, but am familiar with MIT and Apache, which is why I pick them. As I don’t plan to directly use any Postgres code in this project (besides linking) I honestly didn’t consider adopting it. I don’t think it will make this incompatible in any way, but if it’s a big concern we can see about getting the other contributors to allow us to adopt it as an additional option.

I'm a little confused about how the panic handler works or is supposed to work. It seems to turn a rust panic into a FATAL (when I tried it).

Is FATAL the right thing to do? Surely a misguided .unwrap() could just be an ERROR in many cases. If mucking with shared memory, perhaps a postgres PANIC is in order. Why did you choose FATAL?
How does it actually turn into a FATAL? The code looks like it's just doing a resume_panic, how is that supposed to land at the right hander inside of postgres? Don't you need to longjmp (directly or indirectly via pg_re_throw() or elog())?

I opted for fatal as I consider panics to be unknowable in terms of recovery, so it coverts to the elog fatal, which I believe causes the Postgres daemon to hard restart. I’m not sure if this is correct, but it felt right to me.

If you want to only error, and kill the current connection, then the proc_macro will do this based on if the wrapped function returns a Result, and if so, then inspects for an error. If it is an error, then error will be sent to elog, which according to the Postgres docs, will cause the connection to rollback and close.

After reviewing the resume_panic code, I realize this is wrong. Thanks for pointing it out. I’m not sure why I convinced myself that resume was needed. Also, as far as I can I don’t (yet) need to longjump, but I can see why we’d at least need to catch that when SPI code calls into Postgres.

I am trying to do a few things differently in my project which you might find interesting... for instance I am also trying to catch postgres errors and turn them into rust panics (important for SPI, etc.)

Yes! I haven’t gotten to, but want to, work on SPI wrappers. I just haven’t had time to get into that yet.

Great work on the macros! I see that you're using an attribute-like proc macro. That's great because it can inspect the parameter list and return type. Unfortunately, it also obfuscates error messages by always referencing the attribute that created the code with the error. Is there any way to avoid this so that the user can see informative error messages from their own code? I made mine a function-like proc macro, but it only works for the V1 calling convention.

Thanks! Part of my motivation for this project was to learn proc_macros, as I hadn’t worked with them before. In terms of allowing users to inspect the error result, one option could be to add an error_handler which they could replace and then do their own error reporting before the elog facility is called by the library.

Anyway, this was fun to work on, I wrote a post about it here that might explain it some more: https://bluejekyll.github.io/blog/rust/2018/12/27/announcing-pg-extend.html

@jeff-davis
Copy link
Author

Don't worry about the license, I just thought I would mention it but it is not a real problem.

For postgres, the error levels basically mean:

  • ERROR - something bad happened and the transaction should be aborted, but the connection itself (and the process managing the connection) are fine and uncorrupted, and didn't lose track of any important resources. For example:
    • division by zero
    • constraint violation
  • FATAL - The connection or the process managing the connection is bad, and it needs to disconnect the client and terminate the process managing it. For example:
    • authentication failure
    • allocation failure during LLVM JIT compilation of an expression which may leave library state corrupted
  • PANIC -- something really bad happened, and/or something left shared memory corrupted, and the whole postmaster needs to restart

I have the feeling most rust panics would just be ERRORs, because unwinding should clean up any bad state, but I could be missing some potential problems.

In terms of allowing users to inspect the error result, one option could be to add an error_handler which they could replace and then do their own error reporting before the elog facility is called by the library.

My question was how to allow the users of pg-extend to get good rust error messages when compiling their function. If they put a #[pg_extern] above it, all error messages will point at the #[pg_extern] line instead of the function code where the problem is.

@bluejekyll
Copy link
Owner

I have the feeling most rust panics would just be ERRORs, because unwinding should clean up any bad state, but I could be missing some potential problems.

I read through those quickly when I was working on this. It might be safe to drop Rust panics to ERROR, we can experiment with that.

My question was how to allow the users of pg-extend to get good rust error messages when compiling their function. If they put a #[pg_extern] above it, all error messages will point at the #[pg_extern] line instead of the function code where the problem is.

Ah, yes. As this is one of my first proc_macros, I'm not sure what options we have here. I've been trying to keep the wrapper code distinct from the original function. I thought errors in the original function would get raised properly. If the returned type isn't supported by the pg_datum conversions, then, yes, we're going to end up saying the error is in the #[pg_extern] declaration. There might be a better option here.

@intgr
Copy link
Contributor

intgr commented May 25, 2019

I read through those quickly when I was working on this. It might be safe to drop Rust panics to ERROR, we can experiment with that.

I don't mind FATAL myself, but I wanted to add that most of PostgreSQL itself treats broken internal preconditions/invariants as ERROR as well, for example, cache lookup failures from relcache (tables, indexes), unexpected tags for union types, corrupted data on disk, etc.

On the other hand: In ordinary Rust code, panics would of course be safely recoverable, but there's an issue when using libraries that include unsafe Rust code or call other C libraries, which are likely to panic just as well and there's no way to tell the difference.

If the extension calls back into PostgreSQL code that touches shared memory, then it's probably even worthy of a PostgreSQL PANIC.

Perhaps it should be up to the extension author to determine their impact, with ERROR being good default behavior assuming safe Rust code.

@jeff-davis
Copy link
Author

...but there's an issue when using libraries that include unsafe Rust code or call other C libraries... If the extension calls back into PostgreSQL code that touches shared memory, then it's probably even worthy of a PostgreSQL PANIC.

Can you describe in more detail the situations you're concerned about?

@intgr
Copy link
Contributor

intgr commented May 29, 2019

Can you describe in more detail the situations you're concerned about?

Clearly any such situation is a bug and bugs by their very nature are often surprising. :)

An example of shared memory corruption would be calling Postgress code that takes locks in shared memory, such as buffer pins, and then panic!()ing without freeing the locks. In such case FATAL would kill the current backend and other processes may hang forever for a lock that won't be freed.

Another example would be misusing some data structures when calling into PostgreSQL C code, leading to e.g. out of bounds writes to shared memory. It's conceivable that the Rust code that follows might detect that something is off and call panic!(). If that doesn't result in a PostgreSQL PANIC, the corruption would persist. Worse, if this corruption affects a shared buffer page, such corruption could end up being persisted to disk by the background writer.

@bluejekyll
Copy link
Owner

Following up on this, @jeff-davis we've implemented setjmp/longjmp handling in the library. It requires wrapping calls out to PG in a function we called guard_pg, which through testing appears to function as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants