Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vStream events do not populate field "Flags" on FIELD events #5914

Closed
aquarapid opened this issue Mar 11, 2020 · 6 comments · Fixed by #6525
Closed

vStream events do not populate field "Flags" on FIELD events #5914

aquarapid opened this issue Mar 11, 2020 · 6 comments · Fixed by #6525
Assignees
Milestone

Comments

@aquarapid
Copy link
Contributor

I am writing a vStream client, here is some approximate code:

import {
    binlogdatapb "vitess.io/vitess/go/vt/proto/binlogdata"
}
.
.
.
func doSomethingWithEvents(vevents []*binlogdatapb.VEvent) {
    for _, vevent := range vevents {
        switch vevent.Type {
        case binlogdatapb.VEventType_FIELD:
            for _, field := range vevent.FieldEvent.Fields {
                fmt.Printf("field %s flags %d\n", field.Name, field.Flags)
            }
        .
        .
        }
    }
}

This works just fine, but the Flags value for the fields are always returned as 0. The protobuf definition seems to imply that we should be setting the MySQL field definition flags for each column, e.g. 2 == Primary Key; but we don't seem to do this.

In my application, I am interested in mainly the PK flag, but obviously there are a host of others.

@keweishang
Copy link
Contributor

We've also encountered this problem. In addition, we also need to know:

  • if a field is nullable
  • in case the field is decimal, its scale and precision
  • in case the field is enum or set, its permitted values

@rohit-nayak-ps
Copy link
Contributor

We've also encountered this problem. In addition, we also need to know:

  • if a field is nullable
  • in case the field is decimal, its scale and precision
  • in case the field is enum or set, its permitted values

We are going to send all the fields sent in the com_query response packet in the field event: https://dev.mysql.com/doc/internals/en/com-query-response.html#column-definition. These include "decimals" for DECIMAL types, "column_length" (scale+precision). Nullability can be found in the flags (as can the Primary Key @aquarapid)
https://dev.mysql.com/doc/dev/mysql-server/8.0.12/group__group__cs__column__definition__flags.html#ga0b89f761611b41b80de1f7d50f9b6020

For Enums we only get the type. The Row event contains the ordinal of the enums.

@keweishang, Sugu mentioned that you are using Debezium as the event stream sink. debezium/debezium@d8a5d2b suggests that the Debezium adaptor is converting ordinals to the enum string values: if I understand correctly, it is doing this by parsing the table ddls.

Is this something you can use for your integration for getting the enum/set members?

Here is an example output from VStream Copy with enums/decimal and associated field/row events


CREATE TABLE tickets (id INT PRIMARY KEY AUTO_INCREMENT,    title VARCHAR(255) NOT NULL,    priority ENUM('Low', 'Medium', 'High') NOT NULL,    price DECIMAL(10, 2) NOT NULL    );

insert into tickets (title, priority, price) values ('ll', 'Low', 12.34);

type:FIELD timestamp:1596385064 field_event:<table_name:"commerce.tickets" fields:<name:"id" type:INT32 table:"tickets" org_table:"tickets" database:"vt_commerce" org_name:"id" column_length:11 charset:63 flags:49667 > fields:<name:"title" type:VARCHAR table:"tickets" org_table:"tickets" database:"vt_commerce" org_name:"title" column_length:765 charset:33 flags:4097 > fields:<name:"priority" type:ENUM table:"tickets" org_table:"tickets" database:"vt_commerce" org_name:"priority" column_length:18 charset:33 flags:4353 > fields:<name:"price" type:DECIMAL table:"tickets" org_table:"tickets" database:"vt_commerce" org_name:"price" column_length:12 charset:63 decimals:2 flags:36865 > >

type:ROW timestamp:1596385064 row_event:<table_name:"commerce.tickets" row_changes:<after:<lengths:1 lengths:2 lengths:1 lengths:5 values:"1ll112.34" > > > current_time:1596385064296163300 

@keweishang
Copy link
Contributor

keweishang commented Aug 2, 2020

@rohit-nayak-ps thanks for the update. Last time I checked, only column names are available in the FIELD event. Populating other fields in the FIELD event would be great. So far, we've been setting all columns nullable since we missed this information and all events sent to Kafka have no keys due to missing PK information…

We're implementing our own Vitess Connector based on Debezium and will hopefully soon open source it in the next couple of weeks. So far we don't have to parse DDLs since we're counting on the FIELD event to build in-memory schemas. But if we have to parse DDL for the Enum and Set cases, we'll look into it. For now, the Enum and Set cases are not showstopper.

For the Enum case, when we receive ordinals of the Enum value, we can issue a show create table (or describe table) statement to the VTGate MySQL port via JDBC to get the Enum/Set members, however if rare cases, wouldn't we run into race condition if the Enum definition is changed at the same time?

A bit more context about the DDL parsing approach. The more heavy-weight Debezium solution would be taking a snapshot of all tables DDLs (also locking MySQL briefly) upon initializing the connector and save them in a Kafka topic, also parse the DDLs and build in-memory schemas from them. Later when a table's schema is changed, new DDL will be sent to the binlog reader and will be parsed to update the table's in-memory schema. We didn't go with this approach because it's more complicated and we have the FIELD event for the same purpose. But we can implement this approach if we have to.

Thanks.

@sougou
Copy link
Contributor

sougou commented Aug 2, 2020

Can you take a look at the vtctlclient GetSchema command? It may be slightly easier to parse. If that's still a problem, let's open a feature request to extend GetSchema to return more metadata.

@rohit-nayak-ps
Copy link
Contributor

We do have the DDL event also as part of the event stream.

type:DDL timestamp:1596385003 ddl:"CREATE TABLE tickets (id INT PRIMARY KEY AUTO_INCREMENT,    title VARCHAR(255) NOT NULL,    priority ENUM('Low', 'Medium', 'High') NOT NULL,    price DECIMAL(10, 2) NOT NULL    )" current_time:1596385003922592130

That will happen before the Field/Row events, so if there is an alter table you will see it. This should resolve the race you mentioned.

However you do have the problem when you start streaming, if you don't start from the first GTID (the VStream Copy API), since you won't see the DDLs to start with and will have to query the DB as you mentioned. Another way to get the enum/set values by querying the DB is:

SELECT column_type FROM information_schema.COLUMNS WHERE TABLE_NAME = 'tickets' AND COLUMN_NAME = 'priority';
+-----------------------------+
| column_type                 |
+-----------------------------+
| enum('Low','Medium','High') |
+-----------------------------+

@keweishang
Copy link
Contributor

keweishang commented Aug 3, 2020

Thanks @sougou we'll have a look at GetSchema.

Thanks, @rohit-nayak-ps currently when we initialize a connector, we indeed only start reading from the most recent GTID (call it GTID-A), not the first GTID. We store the processed GTID in a Kafka topic so that when we restart the connector, it continues from where it left off. So we'll miss some DDLs before GTID-A indeed. We'll try with GetSchema and the solution you suggested for enum/set values for now:

SELECT column_type FROM information_schema.COLUMNS WHERE TABLE_NAME = 'tickets' AND COLUMN_NAME = 'priority';
+-----------------------------+
| column_type                 |
+-----------------------------+
| enum('Low','Medium','High') |
+-----------------------------+

Thanks.

@askdba askdba added this to the v8.0 milestone Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants