Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement deletion policy #464

Merged
merged 91 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from 85 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
559c8dc
start working on periodic task for cleanup
matthiasschaub Jun 10, 2024
45ee063
feat(db): update files with uuid from qr-code
matthiasschaub Jun 10, 2024
a582b9d
refactor(qr-code): rename function to avoid alias
matthiasschaub Jun 11, 2024
96287e2
refactor(db): read qr-code in insert file function
matthiasschaub Jun 11, 2024
25cd8cc
feat(bbox): add centroid and string functions
matthiasschaub Jun 11, 2024
8504a55
feat(db): write attributes of uploaded file to db
matthiasschaub Jun 11, 2024
c4f42ad
test(tasks): test cleanup old map frames
matthiasschaub Jun 11, 2024
f1cc306
refactor(db): remove unused update files func
matthiasschaub Jun 11, 2024
adf3e6a
refactor(cleanup): remove uuid argument
matthiasschaub Jun 11, 2024
f1368a3
feat(celery/config): schedule cleanup task
matthiasschaub Jun 11, 2024
5fd6b93
refactor: store request metadata in db
matthiasschaub Jun 12, 2024
09154d7
docs: improve docstring
matthiasschaub Jun 12, 2024
ae2827c
feat: cleanup blobs without consent
matthiasschaub Jun 12, 2024
94d7b79
refactor: use celery group primitive to run tasks
matthiasschaub Jun 13, 2024
8319584
feat: use celery chain primitive to run cleanup
matthiasschaub Jun 13, 2024
48ea54a
feat(db): cleanup upload files in database
matthiasschaub Jun 13, 2024
f5a44ad
refactor: rename cleanup to cleanup_map_frames
matthiasschaub Jun 13, 2024
a953754
feat: only delete blobs by uuid
matthiasschaub Jun 18, 2024
a805133
test(models): add validation of centroid
Gigaszi Jun 23, 2024
56444b7
fix(test): round centroid to right count of digits
Gigaszi Jun 23, 2024
04253bb
feat(db): change centroid in blob table to lat and lon column
Gigaszi Jun 23, 2024
8ab3d7e
refactor(db): rename uuid to map_frame_uuid in blob table
Gigaszi Jun 23, 2024
3d73ac9
WIP: deltetion bbox
Gigaszi Jul 1, 2024
5a5ffae
start working on periodic task for cleanup
matthiasschaub Jun 10, 2024
4076d58
feat(db): update files with uuid from qr-code
matthiasschaub Jun 10, 2024
78259fb
refactor(qr-code): rename function to avoid alias
matthiasschaub Jun 11, 2024
cb12c42
refactor(db): read qr-code in insert file function
matthiasschaub Jun 11, 2024
1985e52
feat(bbox): add centroid and string functions
matthiasschaub Jun 11, 2024
4ee783a
feat(db): write attributes of uploaded file to db
matthiasschaub Jun 11, 2024
5071337
test(tasks): test cleanup old map frames
matthiasschaub Jun 11, 2024
825d3f8
refactor(db): remove unused update files func
matthiasschaub Jun 11, 2024
fa27814
refactor(cleanup): remove uuid argument
matthiasschaub Jun 11, 2024
29d483f
feat(celery/config): schedule cleanup task
matthiasschaub Jun 11, 2024
c64397d
refactor: store request metadata in db
matthiasschaub Jun 12, 2024
9078295
docs: improve docstring
matthiasschaub Jun 12, 2024
5d35870
feat: cleanup blobs without consent
matthiasschaub Jun 12, 2024
b8bc176
refactor: use celery group primitive to run tasks
matthiasschaub Jun 13, 2024
db6b8c7
feat: use celery chain primitive to run cleanup
matthiasschaub Jun 13, 2024
e066017
feat(db): cleanup upload files in database
matthiasschaub Jun 13, 2024
78107c4
refactor: rename cleanup to cleanup_map_frames
matthiasschaub Jun 13, 2024
df33e7c
feat: only delete blobs by uuid
matthiasschaub Jun 18, 2024
38188dd
test(models): add validation of centroid
Gigaszi Jun 23, 2024
780fb34
fix(test): round centroid to right count of digits
Gigaszi Jun 23, 2024
5630c0e
feat(db): change centroid in blob table to lat and lon column
Gigaszi Jun 23, 2024
f04a0a9
refactor(db): rename uuid to map_frame_uuid in blob table
Gigaszi Jun 23, 2024
271dfac
delete bbox
matthiasschaub Jul 1, 2024
9665beb
test: re-enable paramatrized tests
matthiasschaub Jul 3, 2024
06720e8
fix: syntax error in SQL query
matthiasschaub Jul 9, 2024
f27db34
test: fix teardown of db fixtures
matthiasschaub Jul 15, 2024
cb1fc22
Merge branch 'delete-policy' of https://github.com/GIScience/sketch-m…
Gigaszi Jul 16, 2024
41b57de
refactor: move cleanup intervall to config variable
matthiasschaub Jul 16, 2024
cdd15ac
docs: improve comments
matthiasschaub Jul 16, 2024
03c7a61
test: support legacy map frames
matthiasschaub Jul 16, 2024
e0fc804
build: increase HTTP timeout for Poetry requests
matthiasschaub Jul 17, 2024
b7c3025
docs: update configuration docs about API keys
matthiasschaub Jul 17, 2024
706f550
docs(conf): simplify sample config
matthiasschaub Jul 17, 2024
f7ce4a5
refactor: fix variable name
matthiasschaub Jul 17, 2024
c1a2661
build: update dependencies
matthiasschaub Jul 17, 2024
4384a89
docs: update celery run command to be Mac compatible
matthiasschaub Jul 17, 2024
9343d13
refactor: update oqapi url
matthiasschaub Jul 17, 2024
a614696
style: run ruff
matthiasschaub Jul 17, 2024
61c0169
test: VCR ignore requests to neptune.ai
matthiasschaub Jul 17, 2024
154a0ee
test: update vcr config and add vcr decorators
matthiasschaub Jul 17, 2024
a75e090
fix(wip): re-enable quality report and rm vcr cassettes
matthiasschaub Jul 17, 2024
cd99656
test: make vcr cassette decorate work and rm unused fixture
matthiasschaub Jul 17, 2024
7155757
test: fix mock test missing attribute status
matthiasschaub Jul 17, 2024
827d694
test: remove unnecessary open/close of db conn
matthiasschaub Jul 17, 2024
9743008
test: fix mock task attrib status & do not close db conn
matthiasschaub Jul 17, 2024
204d9c7
fix: disable quality report generation
matthiasschaub Jul 17, 2024
3d45ecb
test: disable replace image content by vcr
matthiasschaub Jul 17, 2024
f059719
test: re-create VCR cassettes
matthiasschaub Jul 17, 2024
df921a0
build: remove obsolete docker-compose version
matthiasschaub Jul 31, 2024
f6d3793
build: rename docker-compose.yaml to compose.yaml
matthiasschaub Jul 31, 2024
92c6b17
fix(ui): disable quality map check
matthiasschaub Jul 31, 2024
f91216d
docs: add command to connect to db
matthiasschaub Jul 31, 2024
186e817
docs: add setup docs for apple mac m2
matthiasschaub Jul 16, 2024
41514cb
refactor(ui): update consent text on /digitize page
matthiasschaub Jun 12, 2024
ae89ab1
update consent text and made it opt-in
matthiasschaub Jul 8, 2024
dc00a82
build: limit resources of docker compose services
matthiasschaub Jun 4, 2024
f6549fa
test: re-approve approval tests
matthiasschaub Jul 31, 2024
ffad4a7
docs: update dev docs w/ info about db
matthiasschaub Jul 31, 2024
77d985b
refactor: remove unused db client functions
matthiasschaub Jul 31, 2024
669d237
refactor(osm-quality-report): return empty BytesIO
matthiasschaub Jul 31, 2024
0b02f98
Merge remote-tracking branch 'origin/main' into delete-policy
matthiasschaub Jul 31, 2024
6421e85
docs: update path to proj in IDE setup section
matthiasschaub Jul 31, 2024
61a95d0
remove .gitattributes
matthiasschaub Jul 31, 2024
1a77c0e
docs: reference config.py in config docs
matthiasschaub Jul 31, 2024
4018819
fix: update ohsome quality api url
matthiasschaub Aug 1, 2024
9cd8642
tests(vcr): ignore requests to arcgis.com
matthiasschaub Aug 1, 2024
fbb8e7c
tests: re-create vcr cassettes
matthiasschaub Aug 1, 2024
637bd16
update gitignore to ignore celerybeat-schedule files
matthiasschaub Aug 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
tests/fixtures/cassette/ filter=lfs diff=lfs merge=lfs -text
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ RUN npm run build


FROM condaforge/mambaforge:23.3.1-0
# HTTP request timeout. Default is 30 seconds.
ENV POETRY_REQUESTS_TIMEOUT=60

RUN apt-get update \
&& apt-get install -y --no-upgrade \
Expand Down
1 change: 0 additions & 1 deletion docker-compose.yaml → compose.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: "3.9"
services:
flask:
# Web app
Expand Down
25 changes: 4 additions & 21 deletions config/sample.config.toml
Original file line number Diff line number Diff line change
@@ -1,23 +1,6 @@
data-dir = "/some/absolute/path"
user-agent = "sketch-map-tool"
broker-url = "redis://localhost:6379"
result-backend = "db+postgresql://smt:smt@localhost:5432"
wms-url-osm = "https://maps.heigit.org/osm-carto/service?SERVICE=WMS&VERSION=1.1.1"
wms-layers-osm = "heigit:osm-carto@2xx"
wms-url-esri-world-imagery = "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1"
wms-url-esri-world-imagery-fallback = "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1"
wms-layers-esri-world-imagery = "world_imagery"
wms-layers-esri-world-imagery-fallback = "world_imagery_fallback"
wms-read-timeout = 600
max-nr-simultaneous-uploads = 25
max_pixel_per_image = 100000000
# required configuration variables
neptune_api_token = "h0dHBzOi8aHR06E0Z...jMifQ"
neptune_project = "HeiGIT/SketchMapTool"
neptune_model_id_yolo_osm_cls = "SMT-CLR-1"
neptune_model_id_yolo_esri_cls = "SMT-CLR-3"
neptune_model_id_yolo_osm_obj = "SMT-OSM-9"
neptune_model_id_yolo_esri_obj = "SMT-ESRI-1"
neptune_model_id_sam = "SMT-SAM-1"
model_type_sam = "vit_b"
esri-api-key = ""
log-level = "INFO"
# required configuration variables for docker compose setup
# broker-url = "redis://redis:6379"
# result-backend = "db+postgresql://smt:smt@postgres:5432"
14 changes: 14 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,20 @@ All lot of configuration values come with defaults. Required configuration value
- `neptune_api_token`
- `esri-api-key`

### ArcGIS ESRI

To get an ArcGIS/ESRI API key sign-up for [ArcGIS Location Platform](https://location.arcgis.com/sign-up/)
and follow [this tutorial](https://developers.arcgis.com/documentation/security-and-authentication/api-key-authentication/tutorials/create-an-api-key/).

> Note: Keep the referrer field empty.

### neptune.ai

Ask the team to get an invite the Sketch Map Tool project on neptuine.ai.

To get the API key go to "Project Metadata" and copy the key from the example code.


## Configuration for Docker Compose

For running the services using Docker Compose set broker URL and result backend to:
Expand Down
22 changes: 16 additions & 6 deletions docs/development-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
For contributing to this project please also read the [Contribution Guideline](/CONTRIBUTING.md).

> Note: To just run the Sketch Map Tool locally, provide the required [configuration](/docs/configuration.md)
> and use Docker Compose: `docker compose up -d`
> and use Docker Compose: `docker compose up -d`.

## Prerequisites (Requirements)

- Python: `>=3.11`
- [Mamba](https://github.com/conda-forge/miniforge#install): `>=1.4`
- Node: `>=14`

This project uses [Mamba](https://github.com/conda-forge/miniforge#install) for environment and dependencies management. Please make sure it is installed on your system: [Installation Guide](https://github.com/conda-forge/miniforge#install). Instead of Mamba, Conda can also be used.
This project uses [Mamba](https://github.com/conda-forge/miniforge#install) for environment and dependencies management.
Please make sure it is installed on your system: [Installation Guide](https://github.com/conda-forge/miniforge#install).
Instead of Mamba, Conda can also be used.

> Actually, Mamba and Poetry together are used to manage environment and dependencies.
> But only Mamba is required to be present on the system.
Expand Down Expand Up @@ -80,7 +82,7 @@ Please refer to the [configuration documentation](/docs/configuration.md).
```bash
mamba activate smt
docker start smt-postgres smt-redis
celery --app sketch_map_tool.tasks worker --beat --concurrency 4 --loglevel=INFO
celery --app sketch_map_tool.tasks worker --beat --pool solo --loglevel=INFO
```

### 2. Start Flask (Web App)
Expand All @@ -105,7 +107,7 @@ ruff format

### Tests

Provide required [configuration variables](/docs/configuration.md#required-configuration) in `config/test.config.toml`.
Provide required [configuration variables](/docs/configuration.md#required-configuration) in `config/test.config.toml`. Be sure *not* to set `broker-url` and `result-backend`.

To execute all tests run:
```bash
Expand All @@ -114,7 +116,7 @@ pytest

To get live logs, INFO log level and ignore verbose logging messages of VCR run:
```bash
pytest -s --log-level="INFO" --log-disable="vcr"
pytest --capture=no --log-level="INFO" --log-disable="vcr"
```

The integration test suite utilizes the [Testcontainers framework](https://testcontainers.com/)
Expand Down Expand Up @@ -171,14 +173,22 @@ Bundle the code with:
npm run build
```

## Database

To connect to the Postgres database when running it as Docker container with the before mentioned Docker run command:
`psql -h localhost -d smt -U smt -p 5432 -W`.

If you run the database as Docker Compose service run:
`psql -h localhost -d smt -U smt -p 5444 -W`.

## Setup in an IDE

If you setup sketch-map-tool in an IDE like PyCharm please make sure that your IDE does not setup a Poetry managed project/virtual environment.
Go thought the setup steps above in the terminal and change interpreter settings in the IDE to point to the mamba/conda environment.

Also make sure the environment variable `PROJ_LIB` to point to the `proj` directory of the mamba/conda environment:
```bash
PROJ_LIB=/home/$USERDIR/mambaforge/envs/smt/share/proj
PROJ_LIB=/home/$USERDIR/miniforge3/envs/smt/share/proj
```

## Setup on an Apple Mac with M2 chip
Expand Down
7 changes: 4 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion scripts/celery.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash
# Run celery
poetry run celery --app sketch_map_tool.tasks worker --beat --concurrency 4 --loglevel=INFO
poetry run celery --app sketch_map_tool.tasks worker --beat --pool solo --loglevel=INFO
7 changes: 7 additions & 0 deletions sketch_map_tool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@
"worker_send_task_events": True, # send task-related events to be monitored
# Avoid errors due to cached db connections going stale through inactivity
"database_short_lived_sessions": True,
# Cleanup map frames and uploaded files stored in the database
"beat_schedule": {
"cleanup": {
"task": "sketch_map_tool.tasks.cleanup_map_frames",
"schedule": timedelta(hours=3),
},
},
}


Expand Down
1 change: 1 addition & 0 deletions sketch_map_tool/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"user-agent": "sketch-map-tool",
"broker-url": "redis://localhost:6379",
"result-backend": "db+postgresql://smt:smt@localhost:5432",
"cleanup-map-frames-interval": "12 months",
"wms-url-osm": "https://maps.heigit.org/osm-carto/service?SERVICE=WMS&VERSION=1.1.1",
"wms-layers-osm": "heigit:osm-carto@2xx",
"wms-url-esri-world-imagery": "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1",
Expand Down
139 changes: 123 additions & 16 deletions sketch_map_tool/database/client_celery.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
import logging
from io import BytesIO
from uuid import UUID

import psycopg2
from psycopg2.errors import UndefinedTable
from psycopg2.extensions import connection

from sketch_map_tool import __version__
from sketch_map_tool.config import get_config_value
from sketch_map_tool.exceptions import CustomFileNotFoundError
from sketch_map_tool.exceptions import (
CustomFileDoesNotExistAnymoreError,
CustomFileNotFoundError,
)
from sketch_map_tool.helpers import N_
from sketch_map_tool.models import Bbox, Layer, PaperFormat

db_conn: connection | None = None

Expand All @@ -25,29 +32,125 @@ def close_connection():
db_conn.close()


def insert_map_frame(file: BytesIO, uuid: UUID):
"""Insert map frame as blob into the database with the uuid as primary key.
def insert_map_frame(
file: BytesIO,
uuid: UUID,
bbox: Bbox,
format_: PaperFormat,
orientation: str,
layer: Layer,
):
"""Insert map frame alongside map generation parameters into the database.

The map frame is later on needed for georeferencing the uploaded photo or scan of
a sketch map.
The UUID is the primary key.
The map frame is needed for georeferencing the uploaded files (sketch maps).
"""
create_query = """
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA
)
"""
insert_query = "INSERT INTO map_frame(uuid, file) VALUES (%s, %s)"
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA,
bbox VARCHAR,
lat FLOAT,
lon FLOAT,
format VARCHAR,
orientation VARCHAR,
layer VARCHAR,
version VARCHAR,
ts TIMESTAMP WITH TIME ZONE DEFAULT now()
)
"""
insert_query = """
INSERT INTO map_frame (
uuid,
file,
bbox,
lat,
lon,
format,
orientation,
layer,
version
)
VALUES (
%s,
%s,
%s,
%s,
%s,
%s,
%s,
%s,
%s)
"""
with db_conn.cursor() as curs:
curs.execute(create_query)
curs.execute(insert_query, (str(uuid), file.read()))
curs.execute(
insert_query,
(
str(uuid),
file.read(),
str(bbox),
bbox.centroid[0],
bbox.centroid[1],
str(format_),
orientation,
layer,
__version__,
),
)


def delete_map_frame(uuid: UUID):
"""Delete map frame of the associated UUID from the database."""
query = "DELETE FROM map_frame WHERE uuid = %s"
def cleanup_map_frames():
"""Cleanup map frames which are old and without consent.

Only set file to null. Keep metadata.
This function is called by a periodic celery task.
"""
query = """
UPDATE
map_frame
SET
file = NULL,
bbox = NULL
WHERE
ts < NOW() - INTERVAL %s
AND NOT EXISTS (
SELECT
*
FROM
blob
WHERE
map_frame.uuid = blob.map_frame_uuid
AND consent = TRUE);
"""
with db_conn.cursor() as curs:
try:
curs.execute(query, [get_config_value("cleanup-map-frames-interval")])
except UndefinedTable:
logging.info("Table `map_frame` does not exist yet. Nothing todo.")


def cleanup_blob(map_frame_uuids: list[UUID]):
"""Cleanup uploaded files (sketch maps) without consent.

Only set file and name to null. Keep metadata.
This function is called after digitization.
"""
query = """
UPDATE
blob
SET
file = NULL,
file_name = NULL
WHERE
map_frame_uuid = %s
AND consent = FALSE;
"""
with db_conn.cursor() as curs:
curs.execute(query, [str(uuid)])
try:
curs.executemany(query, [map_frame_uuids])
except UndefinedTable:
logging.info("Table `blob` does not exist yet. Nothing todo.")


def select_file(id_: int) -> bytes:
Expand All @@ -57,6 +160,10 @@ def select_file(id_: int) -> bytes:
curs.execute(query, [id_])
raw = curs.fetchone()
if raw:
if raw[0] is None:
raise CustomFileDoesNotExistAnymoreError(
N_("The file with the id: {ID} does not exist anymore"), {"ID", id_}
)
return raw[0]
else:
raise CustomFileNotFoundError(
Expand Down
Loading
Loading