Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Protocol Buffers #174

Closed
1 of 2 tasks
apstndb opened this issue Jul 9, 2024 · 7 comments · Fixed by #182
Closed
1 of 2 tasks

Support Protocol Buffers #174

apstndb opened this issue Jul 9, 2024 · 7 comments · Fixed by #182

Comments

@apstndb
Copy link
Collaborator

apstndb commented Jul 9, 2024

Protocol Buffers support is already GA.
We can discuss initial support of Protocol Buffers in spanner-cli.

  • DDL support with CREATE PROTO BUNDLE
    • Equivalent of gcloud spanner databases ddl update DATABASE_NAME --instance=INSTANCE_NAME --ddl='CREATE PROTO BUNDLE (examples.shipping.Order);' --proto-descriptors-file=order_descriptors.pb
    • It is hard to define intuitive behavior other than executing a single command with --execute.
  • Support decoding of proto-related values like ENUMs.
@yfuruyama
Copy link
Collaborator

CREATE PROTO BUNDLE statement is tricky because it requires a companion file (proto file) to be sent to the Spanner API altogether.

Then, there might be two possible ways:

  1. For interactive mode: Extend CREATE PROTO BUNDLE statement in some ways. For example with a comment line, like CREATE PROTO BUNDLE ... ; # order_descriptors.pb
  2. For batch (command) mode: Create an option similar to --proto-descriptors-file of gcloud.

I think the first option is not intuitive. For second option, I don't think users will use that option if the same functionality is available in gcloud.

Overall, my impression is that we can ask users to use gcloud if they want to run CREATE PROTO BUNDLE statement.

@apstndb
Copy link
Collaborator Author

apstndb commented Jul 9, 2024

I agree CREATE PROTO BUNDLE support is not intuitive and not useful in current proposal.

What about PROTO and ENUM types?

Current behaviour of PROTO and ENUM typed values.

proto

Use https://github.com/googleapis/google-cloud-go/blob/spanner/v1.62.0/spanner/testdata/protos/singer.proto.

Query

WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
FROM t;
spanner> WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
      -> SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
      -> FROM t;
+-----------------------------+------------------+---------------------------------+--------------+
| singer                      | genre            | singer_string                   | genre_string |
+-----------------------------+------------------+---------------------------------+--------------+
| string_value:"GgVKYXBhbiAA" | string_value:"0" | nationality: "Japan" genre: POP | POP          |
+-----------------------------+------------------+---------------------------------+--------------+
1 rows in set (1.07 msecs)

I think the raw string_value: ... is far from ideal output.

Proposed behavior

There are no message descriptor in ResultSet, so I think it is impossible to print as _string.
Most usable information is ProtoTypeFqn. spanner-cli can print values as typed values.

spanner> WITH t AS (SELECT CAST('genre: POP, nationality: "Japan"' AS spanner.examples.music.SingerInfo) AS singer)
      -> SELECT t.singer, t.singer.genre, CAST(t.singer AS STRING) AS singer_string, CAST(t.singer.genre AS STRING) AS genre_string
      -> FROM t;
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
| singer                                              | genre                             | singer_string                   | genre_string |
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
| "GgVKYXBhbiAA" AS spanner.examples.music.SingerInfo | 0 AS spanner.examples.music.Genre | nationality: "Japan" genre: POP | POP          |
+-----------------------------------------------------+-----------------------------------+---------------------------------+--------------+
1 rows in set (1.38 msecs)

@apstndb
Copy link
Collaborator Author

apstndb commented Jul 9, 2024

Is this output a bit long?

@yfuruyama
Copy link
Collaborator

What about PROTO and ENUM types?

Yes, I agree that we should decode PROTO and ENUM typed values properly, but showing the proto field name looks a bit too much.

If users want to check an actual value, they might use CAST(<column> AS STRING) as your example shows, so I think just showing the raw value (base64-encoded string for PROTO and string for ENUM) might be adequate in spanner-cli. Spanner Studio (Cloud Console) also shows both types in that style.

@apstndb
Copy link
Collaborator Author

apstndb commented Jul 9, 2024

I agree FQN type name is too long and the same printing as Spanner Studio is a good choice.
Another option is printing only the last component of the FQN: like "GgVKYXBhbiAA" AS SingerInfo, 0 AS Genre.

@yfuruyama
Copy link
Collaborator

Another option is printing only the last component of the FQN: like "GgVKYXBhbiAA" AS SingerInfo, 0 AS Genre.

This is a very edge case, but it could be confusing if a STRING value contains "AS" in the value.

spanner> SELECT '"GgVKYXBhbiAA" AS SingerInfo';
+------------------------------+
|                              |
+------------------------------+
| "GgVKYXBhbiAA" AS SingerInfo |
+------------------------------+
1 rows in set (2.13 msecs)

So I'm not sure if it's a good idea to annotate the value with a proto type.

@apstndb
Copy link
Collaborator Author

apstndb commented Jul 25, 2024

I often read databases whose schemas I didn't know, so I was worried about outputting raw values ​​that couldn't be read without knowledge of the schemas.
spanner-cli does not display type information for all values, so let's align it with Spanner Studio for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants