Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: go type mapping per query argument #73

Closed
wants to merge 1 commit into from

Conversation

0xjac
Copy link
Contributor

@0xjac 0xjac commented Aug 18, 2022

Extends the query annotation to specify custom go types using the same
qualified go type format as for the general custom type mapping.

Closes #46

Extends the query annotation to specify custom go types using the same
qualified go type format as for the general custom type mapping.

Closes jschaf#46
@0xjac
Copy link
Contributor Author

0xjac commented Aug 18, 2022

@jschaf This is something I need for one of my use case (see #46 for details) so I'm working on an implementation. It's missing the mapping of return columns, tests, and docs, but you can have a look at the mapping of query arguments (inputs) already if you have the time.

If you think this is something you are definitely not interested in merging, I'd appreciate if you would let me know so I don't waste too much time on it 😉

Copy link
Owner

@jschaf jschaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for diving into the codebase. I'm in favor of the change so happy to keep working with you on this PR.

The main thing I'm focused on is how to present the options for the user to keep the interface as simple as possible. I've had a number of requests which essentially boil down to more configurable type mappings (#65). I think the answer is to support an intermediate unmarshal type that's responsible for converting the Postgres type into the destination Go type, so timestamptz to time.Time in the below example.

--go-type timestamptz=github.com/jschaf.TimeUnmarshaller=time.Time

I'd like the syntax to be pretty similar between the query files and command line so I'm interested in what the SQL file would look like (ignoring the intermediate idea since that's out of scope for this work).

-- name: FooBar go-type: timestamptz=time.Time :many

paramTypeOverrides := make(map[string]string, 4)
annotationPos := len(doc.List) - 1

for ; annotationPos > 0; annotationPos-- {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit messier than I'm comfortable with. Might be time to do a proper parse, something like:

  • split the line on whitespace
  • parse each token

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was taking insipiration from how the name: :kind annotation was parsed but I agree it is a bit messy and can be improved. (N.B.)I've moved the parsing of go types into it's own function to clean things up already. I just need to push it.

Can you elaborate on what you mean by proper parser? Do you want a custom built DSL and parser or do you have something in mind to be reused?
One idea would be to find the name: line, then parse everything that's afterwards in a known language like TOML (and we can enforce the structure of it). This would have the advantage to reusue a known configuration language and libs to parse it. However while it's close to the CLI it is a bit different, notably spaces around the equal are allowed and the values (i.e. go types) have to be quoted.

Which way would you like to go?

Copy link
Contributor Author

@0xjac 0xjac Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an example of what I mean with TOML, it would work like this:

-- name: GetDailyFlightsFromAircraft :many
-- [arg]
-- day = "github.com/0xjac/custom-project/types.Day"
-- [return]
-- departure = "time.Time"
-- eta = "*time.Time"
SELECT f.flight_number, f.departure, f.arrival, f.eta
FROM flights f
WHERE pggen.arg("day") <= f.departure AND f.departure < pggen.arg("day") + 1
ORDER BY f.departure DESC;

We could also add the name and kind as top-level keys but it should still support then current name:  notation for compatibility.

SourceSQL string // the complete sql query as it appeared in the source file
PreparedSQL string // the sql query with args replaced by $1, $2, etc.
ParamNames []string // the name of each param in the PreparedSQL, the nth entry is the $n+1 param
ParamTypeOverrides map[string]string // map of Go type to override the Pg type of a param
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this can be ParamGoTypes with a doc of something like:

map of user-specified Go types to use for the arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, and for the return value I would then name it accordingly:

ResultGoTypes map[string]string // map of user-specified Go type for the result columns.

@0xjac
Copy link
Contributor Author

0xjac commented Aug 19, 2022

Thank you for diving into the codebase. I'm in favor of the change so happy to keep working with you on this PR.

The main thing I'm focused on is how to present the options for the user to keep the interface as simple as possible. I've had a number of requests which essentially boil down to more configurable type mappings (#65). I think the answer is to support an intermediate unmarshal type that's responsible for converting the Postgres type into the destination Go type, so timestamptz to time.Time in the below example.

--go-type timestamptz=github.com/jschaf.TimeUnmarshaller=time.Time

I'd like the syntax to be pretty similar between the query files and command line so I'm interested in what the SQL file would look like (ignoring the intermediate idea since that's out of scope for this work).

-- name: FooBar go-type: timestamptz=time.Time :many

Interesting. I can definitely adapt the parsing to be more like the command line arguments. The current format matches what is specified in #46, which in turn is inspired by the current (name: ...) annotation. Taking into account your feedback, I would propose something like:

-- name: GetDailyFlightsFromAircraft :many
-- arg: day=github.com/0xjac/custom-project/types.Day
-- return: departure=time.Time
-- return: eta=*time.Time
SELECT f.flight_number, f.departure, f.arrival, f.eta
FROM flights f
WHERE pggen.arg("day") <= f.departure AND f.departure < pggen.arg("day") + 1
ORDER BY f.departure DESC;

Few things to notice:

  1. The left side of the equal refers to an arg name (for arg:) or a column name (for return:) not a pg type.
  2. The name:, arg:, return: annotations should be in this order. I.e. no arg: after a return:. However the order amongst multiple arg: and return: is free.
  3. Each type mapping should be on a separate line (for readability)
  4. The arg/return type mapping takes precedence, but is not mandatory. If omitted the global type mapping (--go-type via CLI) is considered and then the type inference (as is already the case). Here the f.arrival column will have the go type from which ever globally mapped or inferred type there is for a TIMESTAMP.
  5. The user can specify a go type which is incompatible with the underlying pg type. The type conversion will fail at runtime. (Similar to a bad type map provided globally).

The intermediate unmarshal type idea is interesting but indeed out of scope. Here is a few thoughts that might help tho:
Using a combination of query specific type map on the return return: (once I've implemented it) and ROW() to return a "single column" and map it to a single custom made struct which implements something like the BinaryDecoder and TextDecoder interface and contains all the unmarshalling logic, it should be possible to unmarshall into pretty much any thing with little to no work on the pggen side. And it makes the config easier: No need to provide an intermediary type. The drawback is that the user here might have to write some structs for the result manually and do the whole decoding logic by hand.

@jschaf
Copy link
Owner

jschaf commented Aug 23, 2022

Nice, detailed writeup!

Agreed that we probably need multi-line with this change. Couple tweaks:

  • For ease of parsing, I'm in favor of using a pggen prefix to identify metadata lines. Otherwise, we'd need to do something like find the last instance of --name and require all lines after it begin with pggen-approved prefix.
  • I think columnType reads better than return. Then for consistency, arg becomes argType.
-- pggen: name: GetDailyFlightsFromAircraft :many
-- pggen: argType: day=github.com/0xjac/custom-project/types.Day
-- pggen: columnType: departure=time.Time

I think you nailed the precedence. Enforcing order of argType and columnType is a good thing to enforce early.

I approve of the design with the naming changes (arg -> argType and return -> columnType) and pggen prefix on comment lines. Happy to debate if you feel strongly otherwise.

Sounds like you have a reasonable handle on how to approach the implementation but let me know if a high-level breakdown would help. From my end, I'll be looking closely at:

  • not breaking backwards compatibility
  • testing coverage

@0xjac
Copy link
Contributor Author

0xjac commented Aug 23, 2022

@jschaf Overall it all seems good. There are just a few things to note:

  1. For ease of parsing, I'm in favor of using a pggen prefix to identify metadata lines. Otherwise, we'd need to do something like find the last instance of --name and require all lines after it begin with pggen-approved prefix.

    No problem with using a prefix, then the algorithm would be going through each line with a pggen prefix and parse them from first to last. The annotations should still be one by line and in the order: name, argType, columnType.
    Just one question about it: Do we allow "comment" lines without the pggen: prefix in-between or after the lines with the prefix?

  2. In your example you wrote:

    -- pggen: name: GetDailyFlightsFromAircraft :many

    But this breaks compatibility because of the new prefix. Therefore, I propose we support a name: line both with or without the pggen: prefix.
    This would work as follow:

    1. Find the first line with the pggen: prefix.
    2. If that line is a name: line, then parse it and move on. (Any name: line above without prefix is ignored, as is currently the case.)
    3. If that line is not a name: prefix, parse the previous line expecting a name : without the pggen: prefix. Then move on to parsing the current line as an argType: or columnType:
  3. Unlike the current annotation where name: lines can be repeated and the last one is taken into account, for lines with the pggen: name: prefix, it can only appear once as the first line with the pggen: prefix.

Regarding tests, I will obviously provide tests with this PR. I'm still in the playground/specs phase now but once that is done, I'll start adding the required tests.

@jschaf
Copy link
Owner

jschaf commented Aug 23, 2022

Do we allow "comment" lines without the pggen: prefix in-between or after the lines with the prefix?

I think not. Easier to start strict and allow flexibility later.

But this breaks compatibility because of the new prefix. Therefore, I propose we support a name: line both with or without the pggen: prefix.

Agreed, we're aligned there.

Regarding tests, I will obviously provide tests with this PR. I'm still in the playground/specs phase now but once that is done, I'll start adding the required tests.

Sweet, sounds like a plan.

@jschaf jschaf closed this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Custom go type mapping per query argument
2 participants