Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement deletion policy #464

Merged
merged 91 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
559c8dc
start working on periodic task for cleanup
matthiasschaub Jun 10, 2024
45ee063
feat(db): update files with uuid from qr-code
matthiasschaub Jun 10, 2024
a582b9d
refactor(qr-code): rename function to avoid alias
matthiasschaub Jun 11, 2024
96287e2
refactor(db): read qr-code in insert file function
matthiasschaub Jun 11, 2024
25cd8cc
feat(bbox): add centroid and string functions
matthiasschaub Jun 11, 2024
8504a55
feat(db): write attributes of uploaded file to db
matthiasschaub Jun 11, 2024
c4f42ad
test(tasks): test cleanup old map frames
matthiasschaub Jun 11, 2024
f1cc306
refactor(db): remove unused update files func
matthiasschaub Jun 11, 2024
adf3e6a
refactor(cleanup): remove uuid argument
matthiasschaub Jun 11, 2024
f1368a3
feat(celery/config): schedule cleanup task
matthiasschaub Jun 11, 2024
5fd6b93
refactor: store request metadata in db
matthiasschaub Jun 12, 2024
09154d7
docs: improve docstring
matthiasschaub Jun 12, 2024
ae2827c
feat: cleanup blobs without consent
matthiasschaub Jun 12, 2024
94d7b79
refactor: use celery group primitive to run tasks
matthiasschaub Jun 13, 2024
8319584
feat: use celery chain primitive to run cleanup
matthiasschaub Jun 13, 2024
48ea54a
feat(db): cleanup upload files in database
matthiasschaub Jun 13, 2024
f5a44ad
refactor: rename cleanup to cleanup_map_frames
matthiasschaub Jun 13, 2024
a953754
feat: only delete blobs by uuid
matthiasschaub Jun 18, 2024
a805133
test(models): add validation of centroid
Gigaszi Jun 23, 2024
56444b7
fix(test): round centroid to right count of digits
Gigaszi Jun 23, 2024
04253bb
feat(db): change centroid in blob table to lat and lon column
Gigaszi Jun 23, 2024
8ab3d7e
refactor(db): rename uuid to map_frame_uuid in blob table
Gigaszi Jun 23, 2024
3d73ac9
WIP: deltetion bbox
Gigaszi Jul 1, 2024
5a5ffae
start working on periodic task for cleanup
matthiasschaub Jun 10, 2024
4076d58
feat(db): update files with uuid from qr-code
matthiasschaub Jun 10, 2024
78259fb
refactor(qr-code): rename function to avoid alias
matthiasschaub Jun 11, 2024
cb12c42
refactor(db): read qr-code in insert file function
matthiasschaub Jun 11, 2024
1985e52
feat(bbox): add centroid and string functions
matthiasschaub Jun 11, 2024
4ee783a
feat(db): write attributes of uploaded file to db
matthiasschaub Jun 11, 2024
5071337
test(tasks): test cleanup old map frames
matthiasschaub Jun 11, 2024
825d3f8
refactor(db): remove unused update files func
matthiasschaub Jun 11, 2024
fa27814
refactor(cleanup): remove uuid argument
matthiasschaub Jun 11, 2024
29d483f
feat(celery/config): schedule cleanup task
matthiasschaub Jun 11, 2024
c64397d
refactor: store request metadata in db
matthiasschaub Jun 12, 2024
9078295
docs: improve docstring
matthiasschaub Jun 12, 2024
5d35870
feat: cleanup blobs without consent
matthiasschaub Jun 12, 2024
b8bc176
refactor: use celery group primitive to run tasks
matthiasschaub Jun 13, 2024
db6b8c7
feat: use celery chain primitive to run cleanup
matthiasschaub Jun 13, 2024
e066017
feat(db): cleanup upload files in database
matthiasschaub Jun 13, 2024
78107c4
refactor: rename cleanup to cleanup_map_frames
matthiasschaub Jun 13, 2024
df33e7c
feat: only delete blobs by uuid
matthiasschaub Jun 18, 2024
38188dd
test(models): add validation of centroid
Gigaszi Jun 23, 2024
780fb34
fix(test): round centroid to right count of digits
Gigaszi Jun 23, 2024
5630c0e
feat(db): change centroid in blob table to lat and lon column
Gigaszi Jun 23, 2024
f04a0a9
refactor(db): rename uuid to map_frame_uuid in blob table
Gigaszi Jun 23, 2024
271dfac
delete bbox
matthiasschaub Jul 1, 2024
9665beb
test: re-enable paramatrized tests
matthiasschaub Jul 3, 2024
06720e8
fix: syntax error in SQL query
matthiasschaub Jul 9, 2024
f27db34
test: fix teardown of db fixtures
matthiasschaub Jul 15, 2024
cb1fc22
Merge branch 'delete-policy' of https://github.com/GIScience/sketch-m…
Gigaszi Jul 16, 2024
41b57de
refactor: move cleanup intervall to config variable
matthiasschaub Jul 16, 2024
cdd15ac
docs: improve comments
matthiasschaub Jul 16, 2024
03c7a61
test: support legacy map frames
matthiasschaub Jul 16, 2024
e0fc804
build: increase HTTP timeout for Poetry requests
matthiasschaub Jul 17, 2024
b7c3025
docs: update configuration docs about API keys
matthiasschaub Jul 17, 2024
706f550
docs(conf): simplify sample config
matthiasschaub Jul 17, 2024
f7ce4a5
refactor: fix variable name
matthiasschaub Jul 17, 2024
c1a2661
build: update dependencies
matthiasschaub Jul 17, 2024
4384a89
docs: update celery run command to be Mac compatible
matthiasschaub Jul 17, 2024
9343d13
refactor: update oqapi url
matthiasschaub Jul 17, 2024
a614696
style: run ruff
matthiasschaub Jul 17, 2024
61c0169
test: VCR ignore requests to neptune.ai
matthiasschaub Jul 17, 2024
154a0ee
test: update vcr config and add vcr decorators
matthiasschaub Jul 17, 2024
a75e090
fix(wip): re-enable quality report and rm vcr cassettes
matthiasschaub Jul 17, 2024
cd99656
test: make vcr cassette decorate work and rm unused fixture
matthiasschaub Jul 17, 2024
7155757
test: fix mock test missing attribute status
matthiasschaub Jul 17, 2024
827d694
test: remove unnecessary open/close of db conn
matthiasschaub Jul 17, 2024
9743008
test: fix mock task attrib status & do not close db conn
matthiasschaub Jul 17, 2024
204d9c7
fix: disable quality report generation
matthiasschaub Jul 17, 2024
3d45ecb
test: disable replace image content by vcr
matthiasschaub Jul 17, 2024
f059719
test: re-create VCR cassettes
matthiasschaub Jul 17, 2024
df921a0
build: remove obsolete docker-compose version
matthiasschaub Jul 31, 2024
f6d3793
build: rename docker-compose.yaml to compose.yaml
matthiasschaub Jul 31, 2024
92c6b17
fix(ui): disable quality map check
matthiasschaub Jul 31, 2024
f91216d
docs: add command to connect to db
matthiasschaub Jul 31, 2024
186e817
docs: add setup docs for apple mac m2
matthiasschaub Jul 16, 2024
41514cb
refactor(ui): update consent text on /digitize page
matthiasschaub Jun 12, 2024
ae89ab1
update consent text and made it opt-in
matthiasschaub Jul 8, 2024
dc00a82
build: limit resources of docker compose services
matthiasschaub Jun 4, 2024
f6549fa
test: re-approve approval tests
matthiasschaub Jul 31, 2024
ffad4a7
docs: update dev docs w/ info about db
matthiasschaub Jul 31, 2024
77d985b
refactor: remove unused db client functions
matthiasschaub Jul 31, 2024
669d237
refactor(osm-quality-report): return empty BytesIO
matthiasschaub Jul 31, 2024
0b02f98
Merge remote-tracking branch 'origin/main' into delete-policy
matthiasschaub Jul 31, 2024
6421e85
docs: update path to proj in IDE setup section
matthiasschaub Jul 31, 2024
61a95d0
remove .gitattributes
matthiasschaub Jul 31, 2024
1a77c0e
docs: reference config.py in config docs
matthiasschaub Jul 31, 2024
4018819
fix: update ohsome quality api url
matthiasschaub Aug 1, 2024
9cd8642
tests(vcr): ignore requests to arcgis.com
matthiasschaub Aug 1, 2024
fbb8e7c
tests: re-create vcr cassettes
matthiasschaub Aug 1, 2024
637bd16
update gitignore to ignore celerybeat-schedule files
matthiasschaub Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions sketch_map_tool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@
"worker_send_task_events": True, # send task-related events to be monitored
# Avoid errors due to cached db connections going stale through inactivity
"database_short_lived_sessions": True,
# Cleanup map frames and uploaded files stored in the database
"beat_schedule": {
"cleanup": {
"task": "sketch_map_tool.tasks.cleanup_map_frames",
"schedule": timedelta(hours=1),
},
},
}


Expand Down
131 changes: 119 additions & 12 deletions sketch_map_tool/database/client_celery.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
import logging
from io import BytesIO
from uuid import UUID

import psycopg2
from psycopg2.errors import UndefinedTable
from psycopg2.extensions import connection

from sketch_map_tool import __version__
from sketch_map_tool.config import get_config_value
from sketch_map_tool.exceptions import CustomFileNotFoundError
from sketch_map_tool.exceptions import (
CustomFileDoesNotExistAnymoreError,
CustomFileNotFoundError,
)
from sketch_map_tool.helpers import N_
from sketch_map_tool.models import Bbox, Layer, PaperFormat

db_conn: connection | None = None

Expand All @@ -25,22 +32,68 @@ def close_connection():
db_conn.close()


def insert_map_frame(file: BytesIO, uuid: UUID):
"""Insert map frame as blob into the database with the uuid as primary key.
def insert_map_frame(
file: BytesIO,
uuid: UUID,
bbox: Bbox,
format_: PaperFormat,
orientation: str,
layer: Layer,
):
"""Insert map frame alongside map generation parameters into the database.

The map frame is later on needed for georeferencing the uploaded photo or scan of
a sketch map.
The UUID is the primary key.
The map frame is needed for georeferencing the uploaded files (sketch maps).
"""
create_query = """
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA
)
"""
insert_query = "INSERT INTO map_frame(uuid, file) VALUES (%s, %s)"
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA,
bbox VARCHAR,
centroid VARCHAR,
format VARCHAR,
orientation VARCHAR,
layer VARCHAR,
version VARCHAR,
ts TIMESTAMP WITH TIME ZONE DEFAULT now()
)
"""
insert_query = """
INSERT INTO map_frame (
uuid,
file,
bbox,
centroid,
format,
orientation,
layer,
version
)
VALUES (
%s,
%s,
%s,
%s,
%s,
%s,
%s,
%s)
"""
with db_conn.cursor() as curs:
curs.execute(create_query)
curs.execute(insert_query, (str(uuid), file.read()))
curs.execute(
insert_query,
(
str(uuid),
file.read(),
str(bbox),
",".join([str(bbox.centroid[0]), str(bbox.centroid[1])]),
str(format_),
orientation,
layer,
__version__,
),
)


def delete_map_frame(uuid: UUID):
Expand All @@ -50,13 +103,67 @@ def delete_map_frame(uuid: UUID):
curs.execute(query, [str(uuid)])


def cleanup_map_frames():
"""Cleanup map frames which are old and without consent.

Only set file to null. Keep metadata.
"""
query = """
UPDATE
map_frame
SET
file = NULL
WHERE
ts < NOW() - INTERVAL '6 months'
Gigaszi marked this conversation as resolved.
Show resolved Hide resolved
AND NOT EXISTS (
SELECT
*
FROM
blob
WHERE
map_frame.uuid = blob.uuid
AND consent = TRUE);
"""
with db_conn.cursor() as curs:
try:
curs.execute(query)
except UndefinedTable:
logging.info("Table `map_frame` does not exist yet. Nothing todo.")


def cleanup_blobs():
"""Cleanup uploaded files (sketch maps) without consent.

Only set file and name to null. Keep metadata.
"""
# TODO: Wait one day until deletion or check celery task status?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either implement ts interval of 24 hours or delete only for one uuid

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented delete by map frame uuid.

query = """
UPDATE
blob
SET
file = NULL,
file_name = NULL
WHERE
consent = FALSE;
"""
with db_conn.cursor() as curs:
try:
curs.execute(query)
except UndefinedTable:
logging.info("Table `blob` does not exist yet. Nothing todo.")


def select_file(id_: int) -> bytes:
"""Get an uploaded file stored in the database by ID."""
query = "SELECT file FROM blob WHERE id = %s"
with db_conn.cursor() as curs:
curs.execute(query, [id_])
raw = curs.fetchone()
if raw:
if raw[0] is None:
raise CustomFileDoesNotExistAnymoreError(
N_("The file with the id: {ID} does not exist anymore"), {"ID", id_}
)
return raw[0]
else:
raise CustomFileNotFoundError(
Expand Down
63 changes: 49 additions & 14 deletions sketch_map_tool/database/client_flask.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,19 @@

import psycopg2
from flask import g
from psycopg2.errors import UndefinedTable
from psycopg2.extensions import connection
from werkzeug.utils import secure_filename

from sketch_map_tool.config import get_config_value
from sketch_map_tool.definitions import REQUEST_TYPES
from sketch_map_tool.exceptions import (
CustomFileDoesNotExistAnymoreError,
CustomFileNotFoundError,
UUIDNotFoundError,
)
from sketch_map_tool.helpers import N_
from sketch_map_tool.helpers import N_, to_array
from sketch_map_tool.upload_processing import read_qr_code


def open_connection():
Expand Down Expand Up @@ -85,35 +88,62 @@ def set_async_result_ids(request_uuid, map_: dict[REQUEST_TYPES, str]):
_insert_id_map(request_uuid, map_)


def insert_files(files, consent: bool) -> list[int]:
"""Insert uploaded files as blob into the database and return primary keys"""
def insert_files(files, consent: bool) -> tuple[list[int], list[str], list[str]]:
"""Insert uploaded files as blob into the database and return ID, UUID and name.

UUID is derived from decoding the qr-code.
"""
create_query = """
CREATE TABLE IF NOT EXISTS blob(
id SERIAL PRIMARY KEY,
uuid UUID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to something like map_frame_uuid or map_uuid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

file_name VARCHAR,
file BYTEA,
consent BOOLEAN,
ts TIMESTAMP WITH TIME ZONE DEFAULT now()
)
"""
insert_query = (
"INSERT INTO blob(file_name, file, consent) VALUES (%s, %s, %s) RETURNING id"
)
insert_query = """
INSERT INTO blob (
uuid,
file_name,
file,
consent)
VALUES (
%s,
%s,
%s,
%s)
RETURNING
id,
uuid,
file_name
"""
db_conn = open_connection()
with db_conn.cursor() as curs:
curs.execute(create_query)
ids = []
file_ids = []
uuids = []
file_names = []
for file in files:
file_content = file.read()
qr_code_content = read_qr_code(to_array(file_content))
curs.execute(
insert_query,
(
qr_code_content["uuid"],
secure_filename(file.filename),
file.read(),
file_content,
consent,
),
)
ids.append(curs.fetchone()[0])
return ids
result = curs.fetchone()
if result is None:
raise ValueError()
file_ids.append(result[0])
uuids.append(result[1])
file_names.append(result[2])
return file_ids, uuids, file_names


def select_file(id_: int) -> bytes:
Expand Down Expand Up @@ -153,14 +183,14 @@ def select_file_name(id_: int) -> str:
)


def select_map_frame(uuid: UUID) -> bytes:
def select_map_frame(uuid: UUID) -> tuple[bytes, str, str]:
"""Select map frame of the associated UUID."""
query = "SELECT file FROM map_frame WHERE uuid = %s"
query = "SELECT file, bbox, layer FROM map_frame WHERE uuid = %s"
db_conn = open_connection()
with db_conn.cursor() as curs:
try:
curs.execute(query, [str(uuid)])
except psycopg2.errors.UndefinedTable:
except UndefinedTable:
raise CustomFileNotFoundError(
N_(
"In this Sketch Map Tool instance no sketch map has been "
Expand All @@ -170,7 +200,12 @@ def select_map_frame(uuid: UUID) -> bytes:
)
raw = curs.fetchone()
if raw:
return raw[0]
if raw[0] is None:
raise CustomFileDoesNotExistAnymoreError(
N_("The file with the id: {UUID} does not exist anymore"),
{"UUID", uuid},
)
return raw
else:
raise CustomFileNotFoundError(
N_(
Expand Down
4 changes: 4 additions & 0 deletions sketch_map_tool/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,7 @@ class UUIDNotFoundError(TranslatableError):

class CustomFileNotFoundError(TranslatableError):
pass


class CustomFileDoesNotExistAnymoreError(TranslatableError):
pass
13 changes: 13 additions & 0 deletions sketch_map_tool/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@ class Bbox:
lon_max: float
lat_max: float

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def centroid(self) -> tuple:
"""The coordinates of the centroid."""
lon_centroid = (self.lon_min + self.lon_max) / 2
lat_centroid = (self.lat_min + self.lat_max) / 2
return (lon_centroid, lat_centroid)

def __str__(self):
return f"{self.lon_min},{self.lat_min},{self.lon_max},{self.lat_max}"


@dataclass(frozen=True, kw_only=True)
class Size:
Expand Down Expand Up @@ -85,6 +95,9 @@ class PaperFormat:
qr_contents_distances_not_rotated: tuple[int, int]
qr_contents_distance_rotated: int

def __str__(self):
return self.title


@dataclass()
class LiteratureReference:
Expand Down
Loading
Loading