Merge pull request #399 from koordinates/mysql

Basic but working implementation of MySQL working copy
koordinates · Apr 30, 2021 · 4c2727d · 4c2727d
2 parents 81b68e8 + 99dd089
commit 4c2727d
Show file tree

Hide file tree

Showing 16 changed files with 1,098 additions and 5 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -37,6 +37,12 @@ jobs:
                   -e SA_PASSWORD=PassWord1
                 ports:
                   - 1433:1433
+              mysql:
+                image: mysql
+                options: >-
+                  -e MYSQL_ROOT_PASSWORD=PassWord1
+                ports:
+                  - 3306:3306
 
           - name: macOS
             id: Darwin

diff --git a/Makefile b/Makefile
@@ -188,6 +188,7 @@ ifeq ($(PLATFORM),Linux)
 # (github actions only supports docker containers on linux)
 ci-test: export KART_POSTGRES_URL ?= postgresql://postgres:@localhost:5432/postgres
 ci-test: export KART_SQLSERVER_URL ?= mssql://sa:PassWord1@localhost:1433/master
+ci-test: export KART_MYSQL_URL ?= mysql://root:PassWord1@localhost:3306
 endif
 
 ci-test:

diff --git a/docs/DATASETS_v2.md b/docs/DATASETS_v2.md
@@ -266,3 +266,5 @@ Geometries are encoded using the Standard GeoPackageBinary format specified in [
    - Geometries with a Z component have an XYZ envelope.
    - Other geometries have an XY envelope.
 5. The `srs_id` is always 0, since this information not stored in the geometry object but is stored on a per-column basis in `meta/schema.json` in the `geometryCRS` field.
+
+**Note on axis-ordering:** As required by the GeoPackageBinary format, which Kart uses internally for geometry storage, Kart's axis-ordering is always `(longitude, latitude)`. This same axis-ordering is also used in Kart's JSON and GeoJSON output.
diff --git a/docs/MYSQL_WC.md b/docs/MYSQL_WC.md
@@ -0,0 +1,50 @@
+MySQL Working Copy
+-----------------------
+
+In order to use a [MySQL](https://www.mysql.com/) working copy, you need to have a server running MySQL 8.0 or later. (MySQL 5.6 and later are largely compatible but not officially supported).
+
+### MySQL partitioning
+
+MySQL servers are designed so that they can be used for multiple apps simultaneously without those apps interfering with each other. This is usually achieved by storing data from different apps in different databases.
+
+* A MySQL server contains one or more named databases, which in turn contain tables. A user connected to the server can query tables in any database they have access-rights to without starting a new connection. Two tables can have the same name, as long as they are in different databases.
+
+MySQL has only a single layer of data separation - the *database*. (Contrast to [PostgreSQL](POSTGIS_WC.md) and [Microsoft SQL Server](SQL_SERVER_WC.md) which have two layers, *database* and *schema*). A Kart MySQL working copy can share a server with any other app, but it expects to be given its own database to manage (just as Kart expects to manage its own GPKG working copy, not share it with data from other apps). Managing the database means that Kart is responsible for initialising that database and importing the data in its initial state, then keeping track of any edits made to that data so that they can be committed. Kart expects that the user will use some other application to modify the data in that database as part of making edits to a Kart working copy.
+
+This approach differs from other working copy types that only manage a single *schema* within a database.
+
+### MySQL Connection URI
+
+A Kart repository with a MySQL working copy needs to be configured with a `mysql://` connection URI. This URI contains how to connect to the server, and the name of the database that should be managed as a working copy by this Kart repository.
+
+Kart needs a connection URL in the following format:
+
+`mysql://[user[:password]@][host][:port]/dbname`
+
+For example, a Kart repo called `airport` might have a URL like the following:
+
+`mysql://kart_user:password@localhost:1433/airport_kart`
+
+To configure a Kart repository to use a particular MySQL database as its working copy, specify the `--workingcopy` flag when creating the repository, for example:
+
+`kart init --workingcopy=mysql://... --import=...`
+
+The database that Kart is given to manage should be either non-existent or empty at the time Kart is configured, but the server should already be running.
+
+The database user needs to have full rights to modify objects in the specified database. (eg: via `GRANT ALL PRIVILEGES ON airport_kart.* TO kart_user; FLUSH PRIVILEGES;`).
+
+### MySQL limitations
+
+Most geospatial data can be converted to MySQL format without losing any fidelity, but it does have the following limitations.
+
+#### Three and four dimensional geometries
+
+Geometries in MySQL are always two-dimensional (meaning they have an X and a Y co-ordinate, or a longitude and a latitude co-ordinate). Three- or four-dimensional geometries, with Z (altitude) or M (measure) co-ordinates, are not supported in MySQL. As a result, Kart datasets containing three- and four-dimensional geometries cannot currently be checked out as MySQL working copies.
+
+#### Approximated types
+
+There is one type that Kart supports that has no MySQL equivalent - the `interval`. This type is approximated as `TEXT` in the MySQL working copy. See [APPROXIMATED_TYPES](APPROXIMATED_TYPES.md) for more information.
+
+#### CRS definitions
+
+MySQL comes pre-installed with thousands of standard EPSG coordinate reference system definitions. Currently, only the CRS definitions that are already in your MySQL installation are supported - Kart will not create definitions in MySQL to match the custom definitions attached to your Kart datasets. More documentation will be added here when this is supported.
diff --git a/kart/sqlalchemy/create_engine.py b/kart/sqlalchemy/create_engine.py
@@ -119,6 +119,29 @@ def _on_checkout(dbapi_connection, connection_record, connection_proxy):
     return engine
 
 
+CANONICAL_MYSQL_SCHEME = "mysql"
+INTERNAL_MYSQL_SCHEME = "mysql+pymysql"
+
+
+def mysql_engine(msurl):
+    def _on_checkout(mysql_conn, connection_record, connection_proxy):
+        dbcur = mysql_conn.cursor()
+        dbcur.execute("SET time_zone='UTC';")
+        dbcur.execute("SET sql_mode = 'ANSI_QUOTES';")
+
+    url = urlsplit(msurl)
+    if url.scheme != CANONICAL_MYSQL_SCHEME:
+        raise ValueError("Expecting mysql://")
+    url_path = url.path or "/"  # Empty path doesn't work with non-empty query.
+    url_query = _append_to_query(url.query, {"program_name": "kart"})
+    msurl = urlunsplit([INTERNAL_MYSQL_SCHEME, url.netloc, url_path, url_query, ""])
+
+    engine = sqlalchemy.create_engine(msurl)
+    sqlalchemy.event.listen(engine, "checkout", _on_checkout)
+
+    return engine
+
+
 CANONICAL_SQL_SERVER_SCHEME = "mssql"
 INTERNAL_SQL_SERVER_SCHEME = "mssql+pyodbc"
 SQL_SERVER_INSTALL_DOC_URL = (

diff --git a/kart/sqlalchemy/upsert.py b/kart/sqlalchemy/upsert.py
@@ -67,6 +67,39 @@ def compile_upsert_postgresql(upsert_stmt, compiler, **kwargs):
     return compiler.process(insert_stmt)
 
 
+@compiles(Upsert, "mysql")
+def compile_upsert_mysql(upsert_stmt, compiler, **kwargs):
+    # See https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
+    preparer = compiler.preparer
+
+    def list_cols(col_names, prefix=""):
+        return ", ".join([prefix + c for c in col_names])
+
+    values = ", ".join(upsert_stmt.values(compiler))
+    table = preparer.format_table(upsert_stmt.table)
+    all_columns = [preparer.quote(c.name) for c in upsert_stmt.columns]
+    non_pk_columns = [preparer.quote(c.name) for c in upsert_stmt.non_pk_columns]
+
+    is_gte_version_8 = compiler.dialect.server_version_info[0] >= 8
+
+    if is_gte_version_8:
+        # Post 8.0 - don't use VALUES() again to refer to earlier VALUES.
+        # Instead, alias them. See https://dev.mysql.com/worklog/task/?id=13325
+        result = f"INSERT INTO {table} ({list_cols(all_columns)}) "
+        result += f" VALUES ({values}) AS SOURCE ({list_cols(all_columns)})"
+        result += " ON DUPLICATE KEY UPDATE "
+        result += ", ".join([f"{c} = SOURCE.{c}" for c in non_pk_columns])
+
+    else:
+        # Pre 8.0 - reuse VALUES to refer to earlier VALUES.
+        result = f"INSERT INTO {table} ({list_cols(all_columns)}) "
+        result += f" VALUES ({values})"
+        result += " ON DUPLICATE KEY UPDATE "
+        result += ", ".join([f"{c} = VALUES({c})" for c in non_pk_columns])  # 5.7
+
+    return result
+
+
 @compiles(Upsert, "mssql")
 def compile_upsert_mssql(upsert_stmt, compiler, **kwargs):
     # See https://docs.microsoft.com/sql/t-sql/statements/merge-transact-sql

diff --git a/kart/working_copy/__init__.py b/kart/working_copy/__init__.py
@@ -9,6 +9,7 @@ class WorkingCopyType(Enum):
     GPKG = auto()
     POSTGIS = auto()
     SQL_SERVER = auto()
+    MYSQL = auto()
 
     @classmethod
     def from_location(cls, location, allow_invalid=False):
@@ -17,6 +18,8 @@ def from_location(cls, location, allow_invalid=False):
             return WorkingCopyType.POSTGIS
         elif location.startswith("mssql:"):
             return WorkingCopyType.SQL_SERVER
+        elif location.startswith("mysql:"):
+            return WorkingCopyType.MYSQL
         elif location.lower().endswith(".gpkg"):
             return WorkingCopyType.GPKG
         elif allow_invalid:
@@ -27,7 +30,8 @@ def from_location(cls, location, allow_invalid=False):
                 "Try one of:\n"
                 "  PATH.gpkg\n"
                 "  postgresql://[HOST]/DBNAME/DBSCHEMA\n"
-                "  mssql://[HOST]/DBNAME/DBSCHEMA"
+                "  mssql://[HOST]/DBNAME/DBSCHEMA\n"
+                "  mysql://[HOST]/DBNAME"
             )
 
     @property
@@ -44,6 +48,11 @@ def class_(self):
             from .sqlserver import WorkingCopy_SqlServer
 
             return WorkingCopy_SqlServer
+        elif self is WorkingCopyType.MYSQL:
+            from .mysql import WorkingCopy_MySql
+
+            return WorkingCopy_MySql
+
         raise RuntimeError("Invalid WorkingCopyType")
 
 

diff --git a/kart/working_copy/db_server.py b/kart/working_copy/db_server.py
@@ -90,7 +90,7 @@ def _separate_db_schema(cls, db_uri, expected_path_length=2):
         """
         Removes the DBSCHEMA part off the end of a URI's path, and returns the URI and the DBSCHEMA separately.
         Useful since generally, it is not necessary (or even possible) to connect to a particular DBSCHEMA directly,
-        instead, the rest of the URI is used to connect, then the DBSCHEMA is sped
+        instead, the rest of the URI is used to connect, then the DBSCHEMA is specified in every query / command.
         """
         url = urlsplit(db_uri)
         url_path = PurePath(url.path)