Skip to content

Commit

Permalink
Merge pull request #399 from koordinates/mysql
Browse files Browse the repository at this point in the history
Basic but working implementation of MySQL working copy
  • Loading branch information
olsen232 authored Apr 30, 2021
2 parents 81b68e8 + 99dd089 commit 4c2727d
Show file tree
Hide file tree
Showing 16 changed files with 1,098 additions and 5 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ jobs:
-e SA_PASSWORD=PassWord1
ports:
- 1433:1433
mysql:
image: mysql
options: >-
-e MYSQL_ROOT_PASSWORD=PassWord1
ports:
- 3306:3306

- name: macOS
id: Darwin
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ ifeq ($(PLATFORM),Linux)
# (github actions only supports docker containers on linux)
ci-test: export KART_POSTGRES_URL ?= postgresql://postgres:@localhost:5432/postgres
ci-test: export KART_SQLSERVER_URL ?= mssql://sa:PassWord1@localhost:1433/master
ci-test: export KART_MYSQL_URL ?= mysql://root:PassWord1@localhost:3306
endif

ci-test:
Expand Down
2 changes: 2 additions & 0 deletions docs/DATASETS_v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,5 @@ Geometries are encoded using the Standard GeoPackageBinary format specified in [
- Geometries with a Z component have an XYZ envelope.
- Other geometries have an XY envelope.
5. The `srs_id` is always 0, since this information not stored in the geometry object but is stored on a per-column basis in `meta/schema.json` in the `geometryCRS` field.

**Note on axis-ordering:** As required by the GeoPackageBinary format, which Kart uses internally for geometry storage, Kart's axis-ordering is always `(longitude, latitude)`. This same axis-ordering is also used in Kart's JSON and GeoJSON output.
50 changes: 50 additions & 0 deletions docs/MYSQL_WC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
MySQL Working Copy
-----------------------

In order to use a [MySQL](https://www.mysql.com/) working copy, you need to have a server running MySQL 8.0 or later. (MySQL 5.6 and later are largely compatible but not officially supported).

### MySQL partitioning

MySQL servers are designed so that they can be used for multiple apps simultaneously without those apps interfering with each other. This is usually achieved by storing data from different apps in different databases.

* A MySQL server contains one or more named databases, which in turn contain tables. A user connected to the server can query tables in any database they have access-rights to without starting a new connection. Two tables can have the same name, as long as they are in different databases.

MySQL has only a single layer of data separation - the *database*. (Contrast to [PostgreSQL](POSTGIS_WC.md) and [Microsoft SQL Server](SQL_SERVER_WC.md) which have two layers, *database* and *schema*). A Kart MySQL working copy can share a server with any other app, but it expects to be given its own database to manage (just as Kart expects to manage its own GPKG working copy, not share it with data from other apps). Managing the database means that Kart is responsible for initialising that database and importing the data in its initial state, then keeping track of any edits made to that data so that they can be committed. Kart expects that the user will use some other application to modify the data in that database as part of making edits to a Kart working copy.

This approach differs from other working copy types that only manage a single *schema* within a database.

### MySQL Connection URI

A Kart repository with a MySQL working copy needs to be configured with a `mysql://` connection URI. This URI contains how to connect to the server, and the name of the database that should be managed as a working copy by this Kart repository.

Kart needs a connection URL in the following format:

`mysql://[user[:password]@][host][:port]/dbname`

For example, a Kart repo called `airport` might have a URL like the following:

`mysql://kart_user:password@localhost:1433/airport_kart`

To configure a Kart repository to use a particular MySQL database as its working copy, specify the `--workingcopy` flag when creating the repository, for example:

`kart init --workingcopy=mysql://... --import=...`

The database that Kart is given to manage should be either non-existent or empty at the time Kart is configured, but the server should already be running.

The database user needs to have full rights to modify objects in the specified database. (eg: via `GRANT ALL PRIVILEGES ON airport_kart.* TO kart_user; FLUSH PRIVILEGES;`).

### MySQL limitations

Most geospatial data can be converted to MySQL format without losing any fidelity, but it does have the following limitations.

#### Three and four dimensional geometries

Geometries in MySQL are always two-dimensional (meaning they have an X and a Y co-ordinate, or a longitude and a latitude co-ordinate). Three- or four-dimensional geometries, with Z (altitude) or M (measure) co-ordinates, are not supported in MySQL. As a result, Kart datasets containing three- and four-dimensional geometries cannot currently be checked out as MySQL working copies.

#### Approximated types

There is one type that Kart supports that has no MySQL equivalent - the `interval`. This type is approximated as `TEXT` in the MySQL working copy. See [APPROXIMATED_TYPES](APPROXIMATED_TYPES.md) for more information.

#### CRS definitions

MySQL comes pre-installed with thousands of standard EPSG coordinate reference system definitions. Currently, only the CRS definitions that are already in your MySQL installation are supported - Kart will not create definitions in MySQL to match the custom definitions attached to your Kart datasets. More documentation will be added here when this is supported.
23 changes: 23 additions & 0 deletions kart/sqlalchemy/create_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,29 @@ def _on_checkout(dbapi_connection, connection_record, connection_proxy):
return engine


CANONICAL_MYSQL_SCHEME = "mysql"
INTERNAL_MYSQL_SCHEME = "mysql+pymysql"


def mysql_engine(msurl):
def _on_checkout(mysql_conn, connection_record, connection_proxy):
dbcur = mysql_conn.cursor()
dbcur.execute("SET time_zone='UTC';")
dbcur.execute("SET sql_mode = 'ANSI_QUOTES';")

url = urlsplit(msurl)
if url.scheme != CANONICAL_MYSQL_SCHEME:
raise ValueError("Expecting mysql://")
url_path = url.path or "/" # Empty path doesn't work with non-empty query.
url_query = _append_to_query(url.query, {"program_name": "kart"})
msurl = urlunsplit([INTERNAL_MYSQL_SCHEME, url.netloc, url_path, url_query, ""])

engine = sqlalchemy.create_engine(msurl)
sqlalchemy.event.listen(engine, "checkout", _on_checkout)

return engine


CANONICAL_SQL_SERVER_SCHEME = "mssql"
INTERNAL_SQL_SERVER_SCHEME = "mssql+pyodbc"
SQL_SERVER_INSTALL_DOC_URL = (
Expand Down
33 changes: 33 additions & 0 deletions kart/sqlalchemy/upsert.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,39 @@ def compile_upsert_postgresql(upsert_stmt, compiler, **kwargs):
return compiler.process(insert_stmt)


@compiles(Upsert, "mysql")
def compile_upsert_mysql(upsert_stmt, compiler, **kwargs):
# See https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
preparer = compiler.preparer

def list_cols(col_names, prefix=""):
return ", ".join([prefix + c for c in col_names])

values = ", ".join(upsert_stmt.values(compiler))
table = preparer.format_table(upsert_stmt.table)
all_columns = [preparer.quote(c.name) for c in upsert_stmt.columns]
non_pk_columns = [preparer.quote(c.name) for c in upsert_stmt.non_pk_columns]

is_gte_version_8 = compiler.dialect.server_version_info[0] >= 8

if is_gte_version_8:
# Post 8.0 - don't use VALUES() again to refer to earlier VALUES.
# Instead, alias them. See https://dev.mysql.com/worklog/task/?id=13325
result = f"INSERT INTO {table} ({list_cols(all_columns)}) "
result += f" VALUES ({values}) AS SOURCE ({list_cols(all_columns)})"
result += " ON DUPLICATE KEY UPDATE "
result += ", ".join([f"{c} = SOURCE.{c}" for c in non_pk_columns])

else:
# Pre 8.0 - reuse VALUES to refer to earlier VALUES.
result = f"INSERT INTO {table} ({list_cols(all_columns)}) "
result += f" VALUES ({values})"
result += " ON DUPLICATE KEY UPDATE "
result += ", ".join([f"{c} = VALUES({c})" for c in non_pk_columns]) # 5.7

return result


@compiles(Upsert, "mssql")
def compile_upsert_mssql(upsert_stmt, compiler, **kwargs):
# See https://docs.microsoft.com/sql/t-sql/statements/merge-transact-sql
Expand Down
11 changes: 10 additions & 1 deletion kart/working_copy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ class WorkingCopyType(Enum):
GPKG = auto()
POSTGIS = auto()
SQL_SERVER = auto()
MYSQL = auto()

@classmethod
def from_location(cls, location, allow_invalid=False):
Expand All @@ -17,6 +18,8 @@ def from_location(cls, location, allow_invalid=False):
return WorkingCopyType.POSTGIS
elif location.startswith("mssql:"):
return WorkingCopyType.SQL_SERVER
elif location.startswith("mysql:"):
return WorkingCopyType.MYSQL
elif location.lower().endswith(".gpkg"):
return WorkingCopyType.GPKG
elif allow_invalid:
Expand All @@ -27,7 +30,8 @@ def from_location(cls, location, allow_invalid=False):
"Try one of:\n"
" PATH.gpkg\n"
" postgresql://[HOST]/DBNAME/DBSCHEMA\n"
" mssql://[HOST]/DBNAME/DBSCHEMA"
" mssql://[HOST]/DBNAME/DBSCHEMA\n"
" mysql://[HOST]/DBNAME"
)

@property
Expand All @@ -44,6 +48,11 @@ def class_(self):
from .sqlserver import WorkingCopy_SqlServer

return WorkingCopy_SqlServer
elif self is WorkingCopyType.MYSQL:
from .mysql import WorkingCopy_MySql

return WorkingCopy_MySql

raise RuntimeError("Invalid WorkingCopyType")


Expand Down
2 changes: 1 addition & 1 deletion kart/working_copy/db_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def _separate_db_schema(cls, db_uri, expected_path_length=2):
"""
Removes the DBSCHEMA part off the end of a URI's path, and returns the URI and the DBSCHEMA separately.
Useful since generally, it is not necessary (or even possible) to connect to a particular DBSCHEMA directly,
instead, the rest of the URI is used to connect, then the DBSCHEMA is sped
instead, the rest of the URI is used to connect, then the DBSCHEMA is specified in every query / command.
"""
url = urlsplit(db_uri)
url_path = PurePath(url.path)
Expand Down
Loading

0 comments on commit 4c2727d

Please sign in to comment.