-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for MySQL JSON type #119
Conversation
@rhauch, Thank you! I'll try to provide some feedback first thing in the morning. |
As usual, PR looks good :) A few minor things to consider:
What do you think? |
@shyiko thanks for the review. I agree about removing I can think about whether BTW, neither of the vagrant files are for MySQL 5.7.x, which of course I need to test the JSON functionality. I'm trying to build another vagrant file for 5.7.15, but I'm still trying to get it to build, and then I'm not sure what I need to do to make use of it. Any tips would be helpful. |
@rhauch give me a few minutes. I'll try to build a 5.7.x box for you. |
@rhauch done.
FYI: master can be accessed from the terminal with |
@shyiko Thanks for the 5.7.x vagrant image. I've got individual integration tests working when I run the vagrant image manually, but I'm not having any luck updating the Maven build to use the 5.7.x. When I run the
Any ideas? I'll soon push my latest changes (including the |
@@ -376,7 +383,7 @@ private static int extractBits(long value, int bitOffset, int numberOfBits, int | |||
/** | |||
* see mysql/strings/decimal.c | |||
*/ | |||
private static BigDecimal asBigDecimal(int precision, int scale, byte[] value) { | |||
public static BigDecimal asBigDecimal(int precision, int scale, byte[] value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same logic is needed in the JsonBinary
parsing logic. Perhaps it should be extracted to a separate utility? I'll leave that as future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could probably be moved to ByteArrayInputStream but don't worry about it. I can do that (or something entirely different) as part of #107.
@rhauch I think I forgot to hit release button in Atlas (facepalm). Try now. |
69d88ef
to
e1335d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shyiko, please review the latest code. I believe that this PR now has everything necessary to support JSON values. There are a number of things that could be cleaned up in the future, but that requires your knowledge and expertise (and may be affected by your current or future plans).
int month = yearMonth % 13; | ||
int day = (int) (value >> 17) % (1 << 5); // 5 bits starting at 17th | ||
formatter.valueDate(year, month, day); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this and the other methods to parse temporal values are indeed slightly different from similar methods in AbstractRowsEventDataDeserializer
. In particular, these are in little-endian form, which is contrary to the documentation and AbstractRowsEventDataDeserializer
methods. However, this was verified through testing and via the MySQL source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
int b1 = reader.read() & 0xFF; | ||
int b2 = reader.read(); | ||
return (short) (b2 << 8 | b1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and similar read
methods in this class are also similar to those in ByteArrayInputStream
, except that those other methods do not work in this case as verified by the integration tests in this PR. Although both sets of methods are intended to read little-endian values, the methods in this class properly treat negative values (e.g., like casting to short
on line 880). Changing ByteArrayInputStream
methods was deemed to risky, so they are placed here instead. Also, because they are protected
, clients could subclass this JsonBinary
class and override any methods as deemed necessary, should there be a problem.
* | ||
* @author <a href="mailto:rhauch@gmail.com">Randall Hauch</a> | ||
*/ | ||
public class JsonStringFormatter implements JsonFormatter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, all of the methods in this class are protected
so that they can be overridden and accessed if clients create their own customized subclass of JsonStringFormatter
.
/** | ||
* @author <a href="mailto:rhauch@gmail.com">Randall Hauch</a> | ||
*/ | ||
public class JsonBinaryValueIntegrationTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the integration test methods in this class require MySQL 5.7. I'm not sure whether or how they can be filtered out so that the integration test suite can still be run against earlier versions of MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
try {
master.execute("create table t1 (i INT, j JSON)");
} catch(SQLSyntaxErrorException e) {
throw new org.testng.SkipException("JSON data type is not supported by current version of MySQL");
}
should do it. We could check the MySQL version but this is just simpler.
@shyiko looks like it's now able to run with 5.7.12. I will update the PR to use this by default, and then hopefully the Travis-CI build will pass. |
@shyiko, I've restructured the integration test so that it works in parallel, and I've fixed a few more issues. The Travis-CI build passed successfully, although when I run
|
@rhauch testDeserializationOfDATE failed because sql_mode changed from default '' to NO_ENGINE_SUBSTITUTION in 5.6.6 and ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION in 5.7. More on that here. I'll add a check ( |
/** | ||
* Parse an opaque type. Specific types such as {@link #parseDate(Offset, formatter) DATE}, | ||
* {@link #parseTime(Offset, formatter) TIME},{@link #parseDatetime(Offset, formatter) DATETIME}, and | ||
* {@link #parseTimestamp(Offset, formatter) TIMESTAMP} values are stored as opaque types, though they are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhauch JavaDoc mentions ColumnType.TIMESTAMP but it seems like it's not handled (JsonFormatter.valueTimestamp is also not used). Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though it appears that TIMESTAMP
types are actually stored as DATETIME
. I'll update the JavaDoc and comments within the method.
Added support for the JSON type added to MySQL 5.7. This required adding a new `ColumnType` and a new method to deserialize the JSON value, although the deserialization simply returns the `byte[]` containing the internal binary representation of the JSON value. In order to be useful, this binary representation needs to be parsed into something more useful. A new `JsonBinary` class was created that can perform this parsing and delegate to a `JsonFormatter` implementation when JSON objects or arrays are started/ended, when various name-value pairs are encountered in an object, when values are encountered in an array, and when scalar values are encountered (outside of a JSON object or array, which is not really standard JSON). Since many clients will simply want to obtain a JSON string representation of the JSON value, a `JsonStringFormatter` class was added, and a `JsonBinary.parseAsString` utility method added to conveniently obtain the JSON string representation. Of course, if this is not sufficient, clients can either subclass `JsonStringFormatter` to customize the behavior or implement their own `JsonFormatter` with completely custom behavior. Quite a few integration test methods were added to insert various JSON values into a table, read the corresponding `WriteRowsEventData` event for the INSERT, extract the binary JSON value from the event, and then compare the string representation of the JSON value to expected forms. Note that MySQL does allow for some values within JSON objects and arrays to be of types other than boolean, numeric, and strings (per the JSON spec). For example, MySQL `BLOB`, `DATE`, `TIME`, `DATETIME`, and `TIMESTAMP` values can be used and are encoded as "opaque types", although the `JsonBinary` parser does know how to interpret this opaque forms and call appropriate methods on the `JsonFormatter`. The `JsonStringFormatter` simply encodes the temporal types as formatted dates, times, or timestamps, while all other opaque values are represented as Base64-encoded strings. Also, MySQL does allow scalar values to be used in `JSON` columns outside of JSON objects and arrays. This does not adhere to the JSON spec, and as such the `JsonStringFormatter` simply returns the string representations of these scalar values. (Any other logic can be customized.)
@shyiko I've updated the JavaDoc and comments to clarify how Let me know if you want any other changes made. Otherwise, this should be ready to merge when you see fit. I'd like to also add JSON support to the Debezium MySQL connector, so please let me know when a release might be available. As always, thanks! |
@rhauch with the amount of work you've done you can expect release no later than today (in couple of hours) :) I owe you that much. |
@rhauch once again, thank you for the contribution 🍻. I'll let you know as soon as 0.5.0 is deployed to Maven Central. |
Woot! Thank you! |
Initial code to handle MySQL
JSON
values. Only a few changes were required to support the additionalColumnType
and deserializing values into thebyte[]
holding the binary representation used within the MySQL engine. Note, this binary form is not a UTF-8 encoded representation of a JSON string, and it must be parsed to be usable.A
JsonBinaryValue
utility class was added to parse this internal binary representations and to translate the structure into method calls on aJsonFormatter
interface. AJsonStringFormatter
implementation was provided to return the JSON string representation, though this class was designed to be subclassed if the behavior needs to be customized. Other implementations can be created by users to construct other in-memory structural representations of the JSON values.Note that MySQL
JSON
column can hold a JSON object, a JSON array, or a variety of scalar values. A scalar value can include: an encoded null value; a boolean value; a numeric value of various sizes; any MySQLDATE
,TIME
,DATETIME
, andTIMESTAMP
value; and any other MySQL value encoded in its opaque binary representation.The string representation produced by the
JsonStringFormatter
results in: the conventional forms for null, boolean, numeric, and string values; the string representation of the MySQLDATE
,TIME
, andDATETIME
values; the number of microseconds past epoch in UTC for MySQLTIMESTAMP
values, and base 64 encoded representations of any opaque value.At this time, the code is not yet tested and is therefore not yet ready to merge. Over the next day or two I'll add unit tests (including those that parse various
JSON
binary values) and update this pull request. In the meantime, feedback on the approach is appreciated as soon as possible.fixes #115