Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic but working implementation of MySQL working copy #399

Merged
merged 6 commits into from
Apr 30, 2021
Merged

Conversation

olsen232
Copy link
Collaborator

@olsen232 olsen232 commented Apr 7, 2021

Description

Adds MySQL working copy support to Sno. Checking out, editing, diffing and committing of features themselves and of schema.json is supported. Also added tests - although, they won't run in our CI queue until we set up a MySQL database for them.

Still TODO:

Include the pymysql library and any of its dependencies in our various builds.
Create (/drop) spatial reference systems using the MySQL CREATE SPATIAL REFERENCE SYSTEM command.
Make meta diffs other than schema.json work (eg crs/something.wkt, title, maybe description)
Find a way to make the spatial index work
Add documentation, update changelog

(I accidentally merged this to master once, but ignore that)

@olsen232 olsen232 requested review from rcoup and craigds April 7, 2021 22:08
Copy link
Member

@rcoup rcoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! How do you want to structure the todo list?

Comment on lines 133 to 137
# url_query = _append_to_query(
# url.query, {"Application Name": "sno"}
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


class WorkingCopy_MySql(DatabaseServer_WorkingCopy):
"""
MySQL working copy implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting mysql maps "database" to what we refer to as "schema"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# TODO - MYSQL-PART-2 - We can only create a spatial index if the geometry column is declared
# not-null, but a datasets V2 schema doesn't distinguish between NULL and NOT NULL columns.
# So we don't know if the user would rather have an index, or be able to store NULL values.
return # Find a fix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these become working copy options?

Copy link
Collaborator Author

@olsen232 olsen232 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so! Not completely clear where we put working copy options yet
I suppose as separate git config variables?
The other option is to add them as query params in the working copy location URI - which is also stored in git config in the end.

class GeometryType(UserDefinedType):
"""UserDefinedType so that V2 geometry is adapted to MySQL binary format."""

# TODO: is "axis-order=long-lat" always the correct behaviour? It makes all the tests pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://mysqlserverteam.com/axis-order-in-spatial-reference-systems/
Seems like that's the native storage ordering?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it is, but I don't think that's relevant. MySQL wants to be told what to do on import from WKB / export to WKB - either lat-long, long-lat, or whatever the CRS standard officially defines (probably lat-long, at least for EPSG).

I think the order in which MySQL actually stores them is mostly hidden from us - MySQL knows which number is lat and which is long, and can import and export according to the rule we provide. That's what matters to us.

What I don't really know yet is why all our test data is apparently in long-lat ordering, which is apparently the opposite of the EPSG standard. I think there is a de facto standard that is the opposite of the actual standard, but I don't know if that is something we can rely on in general.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more. Discussion of the basic issue here - you are doubtless more familiar with it than I am:

http://docs.geotools.org/latest/userguide/library/referencing/order.html

Internally, we store geometries using the GPKG format, which officially uses what it calls the "de facto" standard - an axis ordering of long-lat / East-North / XY. (It is XY in that it is across then up - but, all standards are XY in that we always call the first variable X). The GPKG format contains an embedded WKB. The WKB standard seems to leave open to interpretation what order the axes are in - either, they are in the "de facto" standard ordering (long-lat, East-North), or, they are in the order defined by the CRS (which would generally be the EPSG standard - lat-long, North-East). This second ordering seems to be mandated by ISO 19128:2005, which seems not to have been widely adopted since it is the opposite of the de facto standard and nobody seems to have thought of an actual way to gradually migrate every system so that it conforms to the new standard without breaking the de facto standard.

Seems like all of the libraries we are using so far simply assume that "de facto" standard - although perhaps I should double check this - but MySQL is the first library which tries by default to conform to new standard, it assumes that the ordering from the CRS standard is used, which, since our WKB is from a GPKG, it definitely is not - instead it is definitely in GPKG / de-facto / long-lat / East-North ordering. That is why we need to explicitly tell MySQL which order it should actually expect in the WKB that we use to get data into and out of MySQL.

So, I think I can delete this comment, or at least, add some background instead of a TODO. Maybe I should add something to the DATASETS_v2 doc to note that, since Kart stores everything using GPKG geometries, it always uses GPKG / "de-facto" / long-lat / East-North ordering. I think this is true of all the JSON / GEOJSON that Kart outputs too.

CREATE TRIGGER {self._quoted_trigger_name(dataset, 'ins')}
AFTER INSERT ON {table_identifier}
FOR EACH ROW
REPLACE INTO {self.SNO_TRACK} (table_name, pk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like these need casts until we have a wider fix for #398? Unless MySQL auto-casts somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be auto-casting - nothing would work otherwise since our big-3 test data (points, polygons, table) all have integer primary keys, but tracking table is strings.

for key in mysql_adapter.APPROXIMATED_TYPES_EXTRA_TYPE_INFO:
new_col_dict[key] = old_col_dict.get(key)

# Geometry types don't have to be approximated, except for the Z/M specifiers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because mysql doesn't support Z/M?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z and M values are allowed, since MySQL basically stores WKB internally - but AFAICT, there's no way to specify a column as containing only 2, 3, or 4 dimensional geometries. Since Kart's schema does contain this metadata, this schema information is "approximated" - ie a column restricted to n-dimensional geometries where n is ONE OF (2, 3, or 4) in Kart is approximated as a column for n-dimensional geometries, where n is ANY/ALL of (2,3,4) in MySQL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Z and M values are not allowed. Should've checked that. New plan - don't support datasets with Z or M values for now. Discussion of other longer-term solutions can be found in the slack channel.



V2_TYPE_TO_MYSQL_TYPE = {
"boolean": "bit(1)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if this should be TINYINT(1) for compatibility? Would need a lot of boolean columns for it to make much difference storage-wise...

Can add a CHECK constraint presumably to limit it to 0/1 & NOT NULL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a good 95% indifferent to the storage implications of these tiny columns - I chose BIT because 1 bit is the same as 1 boolean, at least in terms of its state space. Of course, we can use a different type if needed: can you spell out the compatibility issue you are trying to avoid here?

Comment on lines 31 to 57
MYSQL_TYPE_TO_V2_TYPE = {
"bit": "boolean",
"tinyint": ("integer", 8),
"smallint": ("integer", 16),
"int": ("integer", 32),
"bigint": ("integer", 64),
"float": ("float", 32),
"double": ("float", 64),
"double precision": ("float", 64),
"binary": "blob",
"blob": "blob",
"char": "text",
"date": "date",
"datetime": "timestamp",
"decimal": "numeric",
"geometry": "geometry",
"numeric": "numeric",
"text": "text",
"time": "time",
"timestamp": "timestamp",
"varchar": "text",
"varbinary": "blob",
}

for prefix in ["tiny", "medium", "long"]:
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob"
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MYSQL_TYPE_TO_V2_TYPE = {
"bit": "boolean",
"tinyint": ("integer", 8),
"smallint": ("integer", 16),
"int": ("integer", 32),
"bigint": ("integer", 64),
"float": ("float", 32),
"double": ("float", 64),
"double precision": ("float", 64),
"binary": "blob",
"blob": "blob",
"char": "text",
"date": "date",
"datetime": "timestamp",
"decimal": "numeric",
"geometry": "geometry",
"numeric": "numeric",
"text": "text",
"time": "time",
"timestamp": "timestamp",
"varchar": "text",
"varbinary": "blob",
}
for prefix in ["tiny", "medium", "long"]:
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob"
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text"
MYSQL_TYPE_TO_V2_TYPE = {
"bit": "boolean",
"tinyint": ("integer", 8),
"smallint": ("integer", 16),
"int": ("integer", 32),
"bigint": ("integer", 64),
"float": ("float", 32),
"double": ("float", 64),
"double precision": ("float", 64),
"binary": "blob",
"blob": "blob",
"char": "text",
"date": "date",
"datetime": "timestamp",
"decimal": "numeric",
"geometry": "geometry",
"longblob": "blob",
"longtext": "text",
"mediumblob": "blob",
"mediumtext": "text",
"numeric": "numeric",
"text": "text",
"time": "time",
"timestamp": "timestamp",
"tinyblob": "blob",
"tinytext": "text",
"varchar": "text",
"varbinary": "blob",
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do - rearranged slightly so as not to redefine this constant, but I still didn't write these out in full. I still think it's easier to read when it's shorter / has less redundant information

@olsen232 olsen232 force-pushed the mysql branch 3 times, most recently from 0565783 to d3ad75b Compare April 14, 2021 03:07
@olsen232 olsen232 force-pushed the mysql branch 4 times, most recently from 30c8699 to 23bccc0 Compare April 27, 2021 00:27
@olsen232 olsen232 force-pushed the mysql branch 2 times, most recently from 05f5bd5 to 55c9d13 Compare April 28, 2021 00:17
@olsen232
Copy link
Collaborator Author

Looks good! How do you want to structure the todo list?

Not sure I quite understand: what are the options? Mostly what I do is:

  • write some notes about TODOs on the PR itself
  • leave some TODOs in the code (in this case I've even added a tag MYSQL-PART-2 so I can find them later)
  • merge to master once the code is through review unless I think it's totally useless on its own or will interfere with an upcoming release (this code seems fine to me to merge to master as long as I also add the pymysql dep, since it more-or-less adds MySQL support - there's some missing bits, but it's not useless)

@olsen232 olsen232 requested a review from rcoup April 28, 2021 05:11
@olsen232 olsen232 force-pushed the mysql branch 2 times, most recently from a31602b to 6a9c92c Compare April 29, 2021 04:52
@olsen232 olsen232 mentioned this pull request Apr 29, 2021
@olsen232 olsen232 merged commit 4c2727d into master Apr 30, 2021
@olsen232 olsen232 deleted the mysql branch April 30, 2021 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants