-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic but working implementation of MySQL working copy #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! How do you want to structure the todo list?
sno/sqlalchemy/create_engine.py
Outdated
# url_query = _append_to_query( | ||
# url.query, {"Application Name": "sno"} | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sno/working_copy/mysql.py
Outdated
|
||
class WorkingCopy_MySql(DatabaseServer_WorkingCopy): | ||
""" | ||
MySQL working copy implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting mysql maps "database" to what we refer to as "schema"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sno/working_copy/mysql.py
Outdated
# TODO - MYSQL-PART-2 - We can only create a spatial index if the geometry column is declared | ||
# not-null, but a datasets V2 schema doesn't distinguish between NULL and NOT NULL columns. | ||
# So we don't know if the user would rather have an index, or be able to store NULL values. | ||
return # Find a fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these become working copy options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so! Not completely clear where we put working copy options yet
I suppose as separate git config variables?
The other option is to add them as query params in the working copy location URI - which is also stored in git config in the end.
sno/working_copy/mysql.py
Outdated
class GeometryType(UserDefinedType): | ||
"""UserDefinedType so that V2 geometry is adapted to MySQL binary format.""" | ||
|
||
# TODO: is "axis-order=long-lat" always the correct behaviour? It makes all the tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://mysqlserverteam.com/axis-order-in-spatial-reference-systems/
Seems like that's the native storage ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it is, but I don't think that's relevant. MySQL wants to be told what to do on import from WKB / export to WKB - either lat-long, long-lat, or whatever the CRS standard officially defines (probably lat-long, at least for EPSG).
I think the order in which MySQL actually stores them is mostly hidden from us - MySQL knows which number is lat and which is long, and can import and export according to the rule we provide. That's what matters to us.
What I don't really know yet is why all our test data is apparently in long-lat ordering, which is apparently the opposite of the EPSG standard. I think there is a de facto standard that is the opposite of the actual standard, but I don't know if that is something we can rely on in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this some more. Discussion of the basic issue here - you are doubtless more familiar with it than I am:
http://docs.geotools.org/latest/userguide/library/referencing/order.html
Internally, we store geometries using the GPKG format, which officially uses what it calls the "de facto" standard - an axis ordering of long-lat / East-North / XY. (It is XY in that it is across then up - but, all standards are XY in that we always call the first variable X). The GPKG format contains an embedded WKB. The WKB standard seems to leave open to interpretation what order the axes are in - either, they are in the "de facto" standard ordering (long-lat, East-North), or, they are in the order defined by the CRS (which would generally be the EPSG standard - lat-long, North-East). This second ordering seems to be mandated by ISO 19128:2005, which seems not to have been widely adopted since it is the opposite of the de facto standard and nobody seems to have thought of an actual way to gradually migrate every system so that it conforms to the new standard without breaking the de facto standard.
Seems like all of the libraries we are using so far simply assume that "de facto" standard - although perhaps I should double check this - but MySQL is the first library which tries by default to conform to new standard, it assumes that the ordering from the CRS standard is used, which, since our WKB is from a GPKG, it definitely is not - instead it is definitely in GPKG / de-facto / long-lat / East-North ordering. That is why we need to explicitly tell MySQL which order it should actually expect in the WKB that we use to get data into and out of MySQL.
So, I think I can delete this comment, or at least, add some background instead of a TODO. Maybe I should add something to the DATASETS_v2 doc to note that, since Kart stores everything using GPKG geometries, it always uses GPKG / "de-facto" / long-lat / East-North ordering. I think this is true of all the JSON / GEOJSON that Kart outputs too.
sno/working_copy/mysql.py
Outdated
CREATE TRIGGER {self._quoted_trigger_name(dataset, 'ins')} | ||
AFTER INSERT ON {table_identifier} | ||
FOR EACH ROW | ||
REPLACE INTO {self.SNO_TRACK} (table_name, pk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like these need casts until we have a wider fix for #398? Unless MySQL auto-casts somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be auto-casting - nothing would work otherwise since our big-3 test data (points, polygons, table) all have integer primary keys, but tracking table is strings.
sno/working_copy/mysql.py
Outdated
for key in mysql_adapter.APPROXIMATED_TYPES_EXTRA_TYPE_INFO: | ||
new_col_dict[key] = old_col_dict.get(key) | ||
|
||
# Geometry types don't have to be approximated, except for the Z/M specifiers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because mysql doesn't support Z/M?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z and M values are allowed, since MySQL basically stores WKB internally - but AFAICT, there's no way to specify a column as containing only 2, 3, or 4 dimensional geometries. Since Kart's schema does contain this metadata, this schema information is "approximated" - ie a column restricted to n-dimensional geometries where n is ONE OF (2, 3, or 4) in Kart is approximated as a column for n-dimensional geometries, where n is ANY/ALL of (2,3,4) in MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z and M values are not allowed. Should've checked that. New plan - don't support datasets with Z or M values for now. Discussion of other longer-term solutions can be found in the slack channel.
sno/working_copy/mysql_adapter.py
Outdated
|
||
|
||
V2_TYPE_TO_MYSQL_TYPE = { | ||
"boolean": "bit(1)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if this should be TINYINT(1)
for compatibility? Would need a lot of boolean columns for it to make much difference storage-wise...
Can add a CHECK constraint presumably to limit it to 0
/1
& NOT NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a good 95% indifferent to the storage implications of these tiny columns - I chose BIT because 1 bit is the same as 1 boolean, at least in terms of its state space. Of course, we can use a different type if needed: can you spell out the compatibility issue you are trying to avoid here?
sno/working_copy/mysql_adapter.py
Outdated
MYSQL_TYPE_TO_V2_TYPE = { | ||
"bit": "boolean", | ||
"tinyint": ("integer", 8), | ||
"smallint": ("integer", 16), | ||
"int": ("integer", 32), | ||
"bigint": ("integer", 64), | ||
"float": ("float", 32), | ||
"double": ("float", 64), | ||
"double precision": ("float", 64), | ||
"binary": "blob", | ||
"blob": "blob", | ||
"char": "text", | ||
"date": "date", | ||
"datetime": "timestamp", | ||
"decimal": "numeric", | ||
"geometry": "geometry", | ||
"numeric": "numeric", | ||
"text": "text", | ||
"time": "time", | ||
"timestamp": "timestamp", | ||
"varchar": "text", | ||
"varbinary": "blob", | ||
} | ||
|
||
for prefix in ["tiny", "medium", "long"]: | ||
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob" | ||
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MYSQL_TYPE_TO_V2_TYPE = { | |
"bit": "boolean", | |
"tinyint": ("integer", 8), | |
"smallint": ("integer", 16), | |
"int": ("integer", 32), | |
"bigint": ("integer", 64), | |
"float": ("float", 32), | |
"double": ("float", 64), | |
"double precision": ("float", 64), | |
"binary": "blob", | |
"blob": "blob", | |
"char": "text", | |
"date": "date", | |
"datetime": "timestamp", | |
"decimal": "numeric", | |
"geometry": "geometry", | |
"numeric": "numeric", | |
"text": "text", | |
"time": "time", | |
"timestamp": "timestamp", | |
"varchar": "text", | |
"varbinary": "blob", | |
} | |
for prefix in ["tiny", "medium", "long"]: | |
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}blob"] = "blob" | |
MYSQL_TYPE_TO_V2_TYPE[f"{prefix}text"] = "text" | |
MYSQL_TYPE_TO_V2_TYPE = { | |
"bit": "boolean", | |
"tinyint": ("integer", 8), | |
"smallint": ("integer", 16), | |
"int": ("integer", 32), | |
"bigint": ("integer", 64), | |
"float": ("float", 32), | |
"double": ("float", 64), | |
"double precision": ("float", 64), | |
"binary": "blob", | |
"blob": "blob", | |
"char": "text", | |
"date": "date", | |
"datetime": "timestamp", | |
"decimal": "numeric", | |
"geometry": "geometry", | |
"longblob": "blob", | |
"longtext": "text", | |
"mediumblob": "blob", | |
"mediumtext": "text", | |
"numeric": "numeric", | |
"text": "text", | |
"time": "time", | |
"timestamp": "timestamp", | |
"tinyblob": "blob", | |
"tinytext": "text", | |
"varchar": "text", | |
"varbinary": "blob", | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't do - rearranged slightly so as not to redefine this constant, but I still didn't write these out in full. I still think it's easier to read when it's shorter / has less redundant information
0565783
to
d3ad75b
Compare
30c8699
to
23bccc0
Compare
05f5bd5
to
55c9d13
Compare
Not sure I quite understand: what are the options? Mostly what I do is:
|
a31602b
to
6a9c92c
Compare
3D and 4D geometries are not supported at all by MySQL.
Description
Adds MySQL working copy support to Sno. Checking out, editing, diffing and committing of features themselves and of schema.json is supported. Also added tests - although, they won't run in our CI queue until we set up a MySQL database for them.
Still TODO:
Include the pymysql library and any of its dependencies in our various builds.
Create (/drop) spatial reference systems using the MySQL CREATE SPATIAL REFERENCE SYSTEM command.
Make meta diffs other than schema.json work (eg crs/something.wkt, title, maybe description)
Find a way to make the spatial index work
Add documentation, update changelog
(I accidentally merged this to master once, but ignore that)