Basic but working implementation of MySQL working copy #399

olsen232 · 2021-04-07T22:08:15Z

Description

Adds MySQL working copy support to Sno. Checking out, editing, diffing and committing of features themselves and of schema.json is supported. Also added tests - although, they won't run in our CI queue until we set up a MySQL database for them.

Still TODO:

Include the pymysql library and any of its dependencies in our various builds.
Create (/drop) spatial reference systems using the MySQL CREATE SPATIAL REFERENCE SYSTEM command.
Make meta diffs other than schema.json work (eg crs/something.wkt, title, maybe description)
Find a way to make the spatial index work
Add documentation, update changelog

(I accidentally merged this to master once, but ignore that)

rcoup

Looks good! How do you want to structure the todo list?

rcoup · 2021-04-08T12:32:28Z

sno/sqlalchemy/create_engine.py

+    # url_query = _append_to_query(
+    #    url.query, {"Application Name": "sno"}
+    # )


program_name?

rcoup · 2021-04-08T12:34:48Z

sno/working_copy/mysql.py

+
+class WorkingCopy_MySql(DatabaseServer_WorkingCopy):
+    """
+    MySQL working copy implementation.


worth noting mysql maps "database" to what we refer to as "schema"?

rcoup · 2021-04-08T12:42:56Z

sno/working_copy/mysql.py

+        # TODO - MYSQL-PART-2 - We can only create a spatial index if the geometry column is declared
+        # not-null, but a datasets V2 schema doesn't distinguish between NULL and NOT NULL columns.
+        # So we don't know if the user would rather have an index, or be able to store NULL values.
+        return  # Find a fix.


I guess these become working copy options?

I guess so! Not completely clear where we put working copy options yet
I suppose as separate git config variables?
The other option is to add them as query params in the working copy location URI - which is also stored in git config in the end.

rcoup · 2021-04-08T12:45:34Z

sno/working_copy/mysql.py

+class GeometryType(UserDefinedType):
+    """UserDefinedType so that V2 geometry is adapted to MySQL binary format."""
+
+    # TODO: is "axis-order=long-lat" always the correct behaviour? It makes all the tests pass.


https://mysqlserverteam.com/axis-order-in-spatial-reference-systems/
Seems like that's the native storage ordering?

Apparently it is, but I don't think that's relevant. MySQL wants to be told what to do on import from WKB / export to WKB - either lat-long, long-lat, or whatever the CRS standard officially defines (probably lat-long, at least for EPSG).

I think the order in which MySQL actually stores them is mostly hidden from us - MySQL knows which number is lat and which is long, and can import and export according to the rule we provide. That's what matters to us.

What I don't really know yet is why all our test data is apparently in long-lat ordering, which is apparently the opposite of the EPSG standard. I think there is a de facto standard that is the opposite of the actual standard, but I don't know if that is something we can rely on in general.

I thought about this some more. Discussion of the basic issue here - you are doubtless more familiar with it than I am:

http://docs.geotools.org/latest/userguide/library/referencing/order.html

Internally, we store geometries using the GPKG format, which officially uses what it calls the "de facto" standard - an axis ordering of long-lat / East-North / XY. (It is XY in that it is across then up - but, all standards are XY in that we always call the first variable X). The GPKG format contains an embedded WKB. The WKB standard seems to leave open to interpretation what order the axes are in - either, they are in the "de facto" standard ordering (long-lat, East-North), or, they are in the order defined by the CRS (which would generally be the EPSG standard - lat-long, North-East). This second ordering seems to be mandated by ISO 19128:2005, which seems not to have been widely adopted since it is the opposite of the de facto standard and nobody seems to have thought of an actual way to gradually migrate every system so that it conforms to the new standard without breaking the de facto standard.

Seems like all of the libraries we are using so far simply assume that "de facto" standard - although perhaps I should double check this - but MySQL is the first library which tries by default to conform to new standard, it assumes that the ordering from the CRS standard is used, which, since our WKB is from a GPKG, it definitely is not - instead it is definitely in GPKG / de-facto / long-lat / East-North ordering. That is why we need to explicitly tell MySQL which order it should actually expect in the WKB that we use to get data into and out of MySQL.

So, I think I can delete this comment, or at least, add some background instead of a TODO. Maybe I should add something to the DATASETS_v2 doc to note that, since Kart stores everything using GPKG geometries, it always uses GPKG / "de-facto" / long-lat / East-North ordering. I think this is true of all the JSON / GEOJSON that Kart outputs too.

rcoup · 2021-04-08T13:46:17Z

sno/working_copy/mysql.py

+                CREATE TRIGGER {self._quoted_trigger_name(dataset, 'ins')}
+                    AFTER INSERT ON {table_identifier}
+                FOR EACH ROW
+                    REPLACE INTO {self.SNO_TRACK} (table_name, pk)


feels like these need casts until we have a wider fix for #398? Unless MySQL auto-casts somehow?

It must be auto-casting - nothing would work otherwise since our big-3 test data (points, polygons, table) all have integer primary keys, but tracking table is strings.

rcoup · 2021-04-08T13:54:21Z

sno/working_copy/mysql.py

+            for key in mysql_adapter.APPROXIMATED_TYPES_EXTRA_TYPE_INFO:
+                new_col_dict[key] = old_col_dict.get(key)
+
+        # Geometry types don't have to be approximated, except for the Z/M specifiers.


because mysql doesn't support Z/M?

Z and M values are allowed, since MySQL basically stores WKB internally - but AFAICT, there's no way to specify a column as containing only 2, 3, or 4 dimensional geometries. Since Kart's schema does contain this metadata, this schema information is "approximated" - ie a column restricted to n-dimensional geometries where n is ONE OF (2, 3, or 4) in Kart is approximated as a column for n-dimensional geometries, where n is ANY/ALL of (2,3,4) in MySQL.

Added a comment.

Z and M values are not allowed. Should've checked that. New plan - don't support datasets with Z or M values for now. Discussion of other longer-term solutions can be found in the slack channel.

rcoup · 2021-04-08T13:58:29Z

sno/working_copy/mysql_adapter.py

+
+
+V2_TYPE_TO_MYSQL_TYPE = {
+    "boolean": "bit(1)",


wonder if this should be TINYINT(1) for compatibility? Would need a lot of boolean columns for it to make much difference storage-wise...

Can add a CHECK constraint presumably to limit it to 0/1 & NOT NULL?

I'm a good 95% indifferent to the storage implications of these tiny columns - I chose BIT because 1 bit is the same as 1 boolean, at least in terms of its state space. Of course, we can use a different type if needed: can you spell out the compatibility issue you are trying to avoid here?

rcoup · 2021-04-08T14:06:52Z

sno/working_copy/mysql_adapter.py

+MYSQL_TYPE_TO_V2_TYPE = {
+    "bit": "boolean",
+    "tinyint": ("integer", 8),
+    "smallint": ("integer", 16),
+    "int": ("integer", 32),
+    "bigint": ("integer", 64),
+    "float": ("float", 32),
+    "double": ("float", 64),
+    "double precision": ("float", 64),
+    "binary": "blob",
+    "blob": "blob",
+    "char": "text",
+    "date": "date",
+    "datetime": "timestamp",
+    "decimal": "numeric",
+    "geometry": "geometry",
+    "numeric": "numeric",
+    "text": "text",
+    "time": "time",
+    "timestamp": "timestamp",
+    "varchar": "text",
+    "varbinary": "blob",
+}
+
+for prefix in ["tiny", "medium", "long"]:
+    MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob"
+    MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text"


Suggested change

MYSQL_TYPE_TO_V2_TYPE = {

"bit": "boolean",

"tinyint": ("integer", 8),

"smallint": ("integer", 16),

"int": ("integer", 32),

"bigint": ("integer", 64),

"float": ("float", 32),

"double": ("float", 64),

"double precision": ("float", 64),

"binary": "blob",

"blob": "blob",

"char": "text",

"date": "date",

"datetime": "timestamp",

"decimal": "numeric",

"geometry": "geometry",

"numeric": "numeric",

"text": "text",

"time": "time",

"timestamp": "timestamp",

"varchar": "text",

"varbinary": "blob",

}

for prefix in ["tiny", "medium", "long"]:

MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob"

MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text"

MYSQL_TYPE_TO_V2_TYPE = {

"bit": "boolean",

"tinyint": ("integer", 8),

"smallint": ("integer", 16),

"int": ("integer", 32),

"bigint": ("integer", 64),

"float": ("float", 32),

"double": ("float", 64),

"double precision": ("float", 64),

"binary": "blob",

"blob": "blob",

"char": "text",

"date": "date",

"datetime": "timestamp",

"decimal": "numeric",

"geometry": "geometry",

"longblob": "blob",

"longtext": "text",

"mediumblob": "blob",

"mediumtext": "text",

"numeric": "numeric",

"text": "text",

"time": "time",

"timestamp": "timestamp",

"tinyblob": "blob",

"tinytext": "text",

"varchar": "text",

"varbinary": "blob",

}

Didn't do - rearranged slightly so as not to redefine this constant, but I still didn't write these out in full. I still think it's easier to read when it's shorter / has less redundant information

olsen232 · 2021-04-28T01:50:31Z

Looks good! How do you want to structure the todo list?

Not sure I quite understand: what are the options? Mostly what I do is:

write some notes about TODOs on the PR itself
leave some TODOs in the code (in this case I've even added a tag MYSQL-PART-2 so I can find them later)
merge to master once the code is through review unless I think it's totally useless on its own or will interfere with an upcoming release (this code seems fine to me to merge to master as long as I also add the pymysql dep, since it more-or-less adds MySQL support - there's some missing bits, but it's not useless)

3D and 4D geometries are not supported at all by MySQL.

olsen232 requested review from rcoup and craigds April 7, 2021 22:08

rcoup reviewed Apr 8, 2021

View reviewed changes

olsen232 force-pushed the mysql branch 3 times, most recently from 0565783 to d3ad75b Compare April 14, 2021 03:07

olsen232 force-pushed the mysql branch 4 times, most recently from 30c8699 to 23bccc0 Compare April 27, 2021 00:27

olsen232 force-pushed the mysql branch 2 times, most recently from 05f5bd5 to 55c9d13 Compare April 28, 2021 00:17

olsen232 mentioned this pull request Apr 28, 2021

Add dependency on pymysql #414

Closed

olsen232 requested a review from rcoup April 28, 2021 05:11

olsen232 force-pushed the mysql branch 2 times, most recently from a31602b to 6a9c92c Compare April 29, 2021 04:52

olsen232 mentioned this pull request Apr 29, 2021

Added docs/MYSQL_WC.md #415

Closed

olsen232 added 6 commits April 30, 2021 09:48

Add dependency on pymysql, cryptography (for MySQL)

e5cdf42

Added docs/MYSQL_WC.md

2ac1df1

Basic but working implementation of MySQL working copy

6468abc

MySQL post review 1 - address rcoup comments

abedf45

Remove Z/M approximation code for MySQL, add an error message instead.

a502d62

3D and 4D geometries are not supported at all by MySQL.

Enable MYSQL CI-tests on linux

99dd089

olsen232 force-pushed the mysql branch from 6a9c92c to 99dd089 Compare April 30, 2021 00:03

olsen232 merged commit 4c2727d into master Apr 30, 2021

olsen232 deleted the mysql branch April 30, 2021 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic but working implementation of MySQL working copy #399

Basic but working implementation of MySQL working copy #399

olsen232 commented Apr 7, 2021

rcoup left a comment

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021 •

edited

Loading

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

olsen232 Apr 28, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

olsen232 Apr 28, 2021

olsen232 Apr 29, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

rcoup Apr 8, 2021

olsen232 Apr 28, 2021

olsen232 commented Apr 28, 2021

Basic but working implementation of MySQL working copy #399

Basic but working implementation of MySQL working copy #399

Conversation

olsen232 commented Apr 7, 2021

Description

rcoup left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olsen232 Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olsen232 commented Apr 28, 2021

olsen232 Apr 28, 2021 •

edited

Loading