Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for MySQL JSON type #119

Merged
merged 1 commit into from
Oct 6, 2016
Merged

Conversation

rhauch
Copy link
Contributor

@rhauch rhauch commented Oct 3, 2016

Initial code to handle MySQL JSON values. Only a few changes were required to support the additional ColumnType and deserializing values into the byte[] holding the binary representation used within the MySQL engine. Note, this binary form is not a UTF-8 encoded representation of a JSON string, and it must be parsed to be usable.

A JsonBinaryValue utility class was added to parse this internal binary representations and to translate the structure into method calls on a JsonFormatter interface. A JsonStringFormatter implementation was provided to return the JSON string representation, though this class was designed to be subclassed if the behavior needs to be customized. Other implementations can be created by users to construct other in-memory structural representations of the JSON values.

Note that MySQL JSON column can hold a JSON object, a JSON array, or a variety of scalar values. A scalar value can include: an encoded null value; a boolean value; a numeric value of various sizes; any MySQL DATE, TIME, DATETIME, and TIMESTAMP value; and any other MySQL value encoded in its opaque binary representation.

The string representation produced by the JsonStringFormatter results in: the conventional forms for null, boolean, numeric, and string values; the string representation of the MySQL DATE, TIME, and DATETIME values; the number of microseconds past epoch in UTC for MySQL TIMESTAMP values, and base 64 encoded representations of any opaque value.

At this time, the code is not yet tested and is therefore not yet ready to merge. Over the next day or two I'll add unit tests (including those that parse various JSON binary values) and update this pull request. In the meantime, feedback on the approach is appreciated as soon as possible.

fixes #115

@shyiko
Copy link
Owner

shyiko commented Oct 4, 2016

@rhauch, Thank you! I'll try to provide some feedback first thing in the morning.

@shyiko
Copy link
Owner

shyiko commented Oct 4, 2016

As usual, PR looks good :)

A few minor things to consider:

  • src/main/java/com/github/shyiko/mysql/binlog/json/Base64.java should probably be removed. JsonStringFormatter can use javax.xml.bind.DatatypeConverter.printBase64Binary(...) instead.
  • How about employing ByteArrayInputStream in JsonBinaryValue? At first glance it seems like it could eliminate JsonReader & Offset (unless I'm missing something).

What do you think?

@rhauch
Copy link
Contributor Author

rhauch commented Oct 4, 2016

@shyiko thanks for the review. I agree about removing Base64 in favor of javax.xml.bind.DatatypeConverter.printBase64Binary(...). I've been on Java 8 for a long time, and forget all the tricks of 6 and 7.

I can think about whether JsonReader and Offset can be replaced by ByteArrayInputStream. The challenge is that the MySQL binary format doesn't lend itself well to stream-based processing. I can rewrite the logic to see if I can figure out how to do that.

BTW, neither of the vagrant files are for MySQL 5.7.x, which of course I need to test the JSON functionality. I'm trying to build another vagrant file for 5.7.15, but I'm still trying to get it to build, and then I'm not sure what I need to do to make use of it. Any tips would be helpful.

@shyiko
Copy link
Owner

shyiko commented Oct 4, 2016

@rhauch give me a few minutes. I'll try to build a 5.7.x box for you.

@shyiko
Copy link
Owner

shyiko commented Oct 4, 2016

@rhauch done.

  • If you plan to run tests continuously from IDE, use
    cd supplement/vagrant/mysql-5.7.15-sandbox-prepackaged && vagrant up.
  • If you want to run tests from command line:
    mvn verify -Dvagrant.integration.box=supplement/vagrant/mysql-5.7.15-sandbox-prepackaged (note: this will automatically vagrant up & vagrant destroy)

FYI: master can be accessed from the terminal with mysql --protocol=tcp -P 33061 -h localhost -umsandbox -pmsandbox, slave is listening on port 33062 (as specified in src/test/resources/jdbc.properties).

@rhauch
Copy link
Contributor Author

rhauch commented Oct 5, 2016

@shyiko Thanks for the 5.7.x vagrant image. I've got individual integration tests working when I run the vagrant image manually, but I'm not having any luck updating the Maven build to use the 5.7.x.

When I run the mvn verify -Dvagrant.integration.box=supplement/vagrant/mysql-5.7.15-sandbox-prepackaged command or if I try to update the vagrant.integration.box property in the POM file to point to the correct directory, I get this error:

[INFO] Bringing machine 'default' up with 'virtualbox' provider...
[INFO] ==> default: Box 'shyiko/mysql-sandbox-prepackaged' could not be found. Attempting to find and install...
[INFO]     default: Box Provider: virtualbox
[INFO]     default: Box Version: 5.7.15
[INFO] ==> default: Loading metadata for box 'shyiko/mysql-sandbox-prepackaged'
[INFO]     default: URL: https://atlas.hashicorp.com/shyiko/mysql-sandbox-prepackaged
[INFO] The box you're attempting to add has no available version that
[INFO] matches the constraints you requested. Please double-check your
[INFO] settings. Also verify that if you specified version constraints,
[INFO] that the provider you wish to use is available for these constraints.
[INFO] 
[INFO] Box: shyiko/mysql-sandbox-prepackaged
[INFO] Address: https://atlas.hashicorp.com/shyiko/mysql-sandbox-prepackaged
[INFO] Constraints: 5.7.15
[INFO] Available versions: 5.5.27, 5.6.12

Any ideas? I'll soon push my latest changes (including the vagrant.integration.box change), and I suspect they will fail the Travis-CI build. However, all of the code should be complete and lots of new integration tests verify the behavior. I've got a few questions, but I'll add comments to the code where those are or where I want to point something out.

@@ -376,7 +383,7 @@ private static int extractBits(long value, int bitOffset, int numberOfBits, int
/**
* see mysql/strings/decimal.c
*/
private static BigDecimal asBigDecimal(int precision, int scale, byte[] value) {
public static BigDecimal asBigDecimal(int precision, int scale, byte[] value) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same logic is needed in the JsonBinary parsing logic. Perhaps it should be extracted to a separate utility? I'll leave that as future work.

Copy link
Owner

@shyiko shyiko Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it could probably be moved to ByteArrayInputStream but don't worry about it. I can do that (or something entirely different) as part of #107.

@shyiko
Copy link
Owner

shyiko commented Oct 5, 2016

@rhauch I think I forgot to hit release button in Atlas (facepalm). Try now.

@rhauch rhauch force-pushed the issue-115 branch 2 times, most recently from 69d88ef to e1335d0 Compare October 5, 2016 21:08
Copy link
Contributor Author

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shyiko, please review the latest code. I believe that this PR now has everything necessary to support JSON values. There are a number of things that could be cleaned up in the future, but that requires your knowledge and expertise (and may be affected by your current or future plans).

int month = yearMonth % 13;
int day = (int) (value >> 17) % (1 << 5); // 5 bits starting at 17th
formatter.valueDate(year, month, day);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this and the other methods to parse temporal values are indeed slightly different from similar methods in AbstractRowsEventDataDeserializer. In particular, these are in little-endian form, which is contrary to the documentation and AbstractRowsEventDataDeserializer methods. However, this was verified through testing and via the MySQL source code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

int b1 = reader.read() & 0xFF;
int b2 = reader.read();
return (short) (b2 << 8 | b1);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and similar read methods in this class are also similar to those in ByteArrayInputStream, except that those other methods do not work in this case as verified by the integration tests in this PR. Although both sets of methods are intended to read little-endian values, the methods in this class properly treat negative values (e.g., like casting to short on line 880). Changing ByteArrayInputStream methods was deemed to risky, so they are placed here instead. Also, because they are protected, clients could subclass this JsonBinary class and override any methods as deemed necessary, should there be a problem.

*
* @author <a href="mailto:rhauch@gmail.com">Randall Hauch</a>
*/
public class JsonStringFormatter implements JsonFormatter {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, all of the methods in this class are protected so that they can be overridden and accessed if clients create their own customized subclass of JsonStringFormatter.

/**
* @author <a href="mailto:rhauch@gmail.com">Randall Hauch</a>
*/
public class JsonBinaryValueIntegrationTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the integration test methods in this class require MySQL 5.7. I'm not sure whether or how they can be filtered out so that the integration test suite can still be run against earlier versions of MySQL.

Copy link
Owner

@shyiko shyiko Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

try {
    master.execute("create table t1 (i INT, j JSON)");
} catch(SQLSyntaxErrorException e) {
    throw new org.testng.SkipException("JSON data type is not supported by current version of MySQL");
}

should do it. We could check the MySQL version but this is just simpler.

@rhauch
Copy link
Contributor Author

rhauch commented Oct 5, 2016

@shyiko looks like it's now able to run with 5.7.12. I will update the PR to use this by default, and then hopefully the Travis-CI build will pass.

@rhauch
Copy link
Contributor Author

rhauch commented Oct 5, 2016

@shyiko, I've restructured the integration test so that it works in parallel, and I've fixed a few more issues. The Travis-CI build passed successfully, although when I run mvn clean install locally I get the following error in code that this PR does not affect (and what I think is a timezone issue):

Tests run: 69, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.96 sec <<< FAILURE! - in TestSuite
testDeserializationOfDATE(com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest)  Time elapsed: 0.092 sec  <<< FAILURE!
com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect date value: '1989-00-21' for column 'column_' at row 1
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4072)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4006)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2713)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2663)
    at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:888)
    at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:730)
    at com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest$6.execute(BinaryLogClientIntegrationTest.java:368)
    at com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest$6.execute(BinaryLogClientIntegrationTest.java:356)
    at com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest$MySQLConnection.execute(BinaryLogClientIntegrationTest.java:900)
    at com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest.writeAndCaptureRow(BinaryLogClientIntegrationTest.java:356)
    at com.github.shyiko.mysql.binlog.BinaryLogClientIntegrationTest.testDeserializationOfDATE(BinaryLogClientIntegrationTest.java:258)


Results :

Failed tests: 
  BinaryLogClientIntegrationTest.testDeserializationOfDATE:258->writeAndCaptureRow:356 » MysqlDataTruncation

@shyiko
Copy link
Owner

shyiko commented Oct 6, 2016

@rhauch testDeserializationOfDATE failed because sql_mode changed from default '' to NO_ENGINE_SUBSTITUTION in 5.6.6 and ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION in 5.7. More on that here. I'll add a check (select @@sql_mode;) for that after this PR is merged.

/**
* Parse an opaque type. Specific types such as {@link #parseDate(Offset, formatter) DATE},
* {@link #parseTime(Offset, formatter) TIME},{@link #parseDatetime(Offset, formatter) DATETIME}, and
* {@link #parseTimestamp(Offset, formatter) TIMESTAMP} values are stored as opaque types, though they are
Copy link
Owner

@shyiko shyiko Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhauch JavaDoc mentions ColumnType.TIMESTAMP but it seems like it's not handled (JsonFormatter.valueTimestamp is also not used). Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it appears that TIMESTAMP types are actually stored as DATETIME. I'll update the JavaDoc and comments within the method.

Added support for the JSON type added to MySQL 5.7. This required adding a new `ColumnType` and a new method to deserialize the JSON value, although the deserialization simply returns the `byte[]` containing the internal binary representation of the JSON value.

In order to be useful, this binary representation needs to be parsed into something more useful. A new `JsonBinary` class was created that can perform this parsing and delegate to a `JsonFormatter` implementation when JSON objects or arrays are started/ended, when various name-value pairs are encountered in an object, when values are encountered in an array, and when scalar values are encountered (outside of a JSON object or array, which is not really standard JSON).

Since many clients will simply want to obtain a JSON string representation of the JSON value, a `JsonStringFormatter` class was added, and a `JsonBinary.parseAsString` utility method added to conveniently obtain the JSON string representation. Of course, if this is not sufficient, clients can either subclass `JsonStringFormatter` to customize the behavior or implement their own `JsonFormatter` with completely custom behavior.

Quite a few integration test methods were added to insert various JSON values into a table, read the corresponding `WriteRowsEventData` event for the INSERT, extract the binary JSON value from the event, and then compare the string representation of the JSON value to expected forms.

Note that MySQL does allow for some values within JSON objects and arrays to be of types other than boolean, numeric, and strings (per the JSON spec). For example, MySQL `BLOB`, `DATE`, `TIME`, `DATETIME`, and `TIMESTAMP` values can be used and are encoded as "opaque types", although the `JsonBinary` parser does know how to interpret this opaque forms and call appropriate methods on the `JsonFormatter`. The `JsonStringFormatter` simply encodes the temporal types as formatted dates, times, or timestamps, while all other opaque values are represented as Base64-encoded strings.

Also, MySQL does allow scalar values to be used in `JSON` columns outside of JSON objects and arrays. This does not adhere to the JSON spec, and as such the `JsonStringFormatter` simply returns the string representations of these scalar values. (Any other logic can be customized.)
@rhauch
Copy link
Contributor Author

rhauch commented Oct 6, 2016

@shyiko I've updated the JavaDoc and comments to clarify how TIMESTAMP is handled, and also added another test case. And, as you suggested, I added a check for the MySQL 5.7 specific CREATE TABLE statement that throws org.testng.SkipException.

Let me know if you want any other changes made. Otherwise, this should be ready to merge when you see fit. I'd like to also add JSON support to the Debezium MySQL connector, so please let me know when a release might be available. As always, thanks!

@shyiko
Copy link
Owner

shyiko commented Oct 6, 2016

@rhauch with the amount of work you've done you can expect release no later than today (in couple of hours) :) I owe you that much.

@shyiko shyiko merged commit a2afe53 into shyiko:master Oct 6, 2016
@shyiko
Copy link
Owner

shyiko commented Oct 6, 2016

@rhauch once again, thank you for the contribution 🍻. I'll let you know as soon as 0.5.0 is deployed to Maven Central.

@shyiko
Copy link
Owner

shyiko commented Oct 6, 2016

@rhauch 0.5.0 was published to Maven Central 🙇 .

@rhauch
Copy link
Contributor Author

rhauch commented Oct 6, 2016

Woot! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support JSON type added in MySQL 5.7
2 participants