Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WJ-1196] [WJ-1031] Add support for S3 presign URLs to upload blobs #1918

Merged
merged 91 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
d54c56b
Add new file section to deepwell config.
emmiegit May 5, 2024
5cc26a2
Add file section to configuration.
emmiegit May 6, 2024
da8a533
Change field to seconds, not Duration.
emmiegit May 6, 2024
b8003b2
Begin BlobService::upload_url().
emmiegit May 6, 2024
e68cf45
Add blob_upload table.
emmiegit May 6, 2024
78397fe
Add timestamp for partial upload pruning.
emmiegit May 6, 2024
cf50e9b
Start BlobService changes for presign URL system.
emmiegit May 6, 2024
ac80c13
Add file_pending migration and pending model.
emmiegit May 12, 2024
ee6aedb
Start upload code.
emmiegit May 12, 2024
e9211e6
Rename file_pending -> blob_pending.
emmiegit May 12, 2024
8f852da
Fix compilation.
emmiegit May 12, 2024
f89f871
[WIP] Start division between new and edit file uploads.
emmiegit Jun 12, 2024
7e25aef
Add created_at column to blob_pending.
emmiegit Sep 8, 2024
b9813c9
Use find_by_id() instead of find().
emmiegit Sep 8, 2024
b36b64c
Run rustfmt.
emmiegit Sep 9, 2024
8d967ad
Add FileRevisionService::create_pending().
emmiegit Sep 9, 2024
2d27a68
Add FileRevisionService::get_first().
emmiegit Sep 9, 2024
e11c416
Update comment.
emmiegit Sep 9, 2024
51f3fa0
Rename structs.
emmiegit Sep 9, 2024
657818d
Run rustfmt.
emmiegit Sep 9, 2024
ba27c9f
Add proper StartFileUploadOutput struct.
emmiegit Sep 9, 2024
0284984
Reword column clear again.
emmiegit Sep 9, 2024
bf66088
Update comments.
emmiegit Sep 10, 2024
ddba81d
Remove dead_code suppression.
emmiegit Sep 10, 2024
b0bff98
Add TODOs for incomplete file pruning jobs.
emmiegit Sep 10, 2024
26f345d
Add FileRevisionService::finish_upload().
emmiegit Sep 10, 2024
d682fae
Improve output of finish_new_upload().
emmiegit Sep 10, 2024
f6cc6c4
Delete dummy structs.
emmiegit Sep 10, 2024
74b23ec
Rename types.
emmiegit Sep 10, 2024
cd2b08e
Merge pending jobs.
emmiegit Sep 11, 2024
52a0379
Stub out edits for now.
emmiegit Sep 11, 2024
8020aa1
Fix build errors.
emmiegit Sep 18, 2024
d5db6ac
Fix CHECK constraints.
emmiegit Sep 19, 2024
5bd2568
Implement file_upload_* API methods.
emmiegit Sep 20, 2024
57f4d96
Rename file creation structs.
emmiegit Sep 20, 2024
f4ed066
Rename upload API methods.
emmiegit Sep 20, 2024
3ffb213
Remove unused struct.
emmiegit Sep 20, 2024
1a5342e
Address warnings.
emmiegit Sep 20, 2024
7aa45e2
Update methods for blob create, upload, then file create.
emmiegit Sep 28, 2024
3f36090
Fix file edit processing.
emmiegit Sep 28, 2024
1d1cd73
Pass out blob_created flag.
emmiegit Sep 29, 2024
007de77
Implement file_create() method.
emmiegit Sep 29, 2024
a9132c4
Remove unused method.
emmiegit Sep 29, 2024
db5372c
Remove unused structs.
emmiegit Sep 29, 2024
a63f07d
Run rustfmt.
emmiegit Sep 29, 2024
5c3abed
Don't transform keys or API structs.
emmiegit Sep 29, 2024
922f26a
Add created_by column to migration.
emmiegit Sep 29, 2024
99e8402
Store explicit expires_at timestamp for easy identification.
emmiegit Sep 29, 2024
873e1a6
Increment file revision number on edit.
emmiegit Sep 29, 2024
5a4cc13
Remove unneded Default fill.
emmiegit Sep 29, 2024
6de0b30
Change to uploaded_blob_id in input struct.
emmiegit Sep 30, 2024
c48eab1
Move new blob error placement.
emmiegit Sep 30, 2024
495f082
Add helper for pending blob and check, add cancel_upload().
emmiegit Sep 30, 2024
01344d3
Add blob_cancel API method.
emmiegit Sep 30, 2024
0e76c53
Rename argument for head().
emmiegit Sep 30, 2024
2619af1
Only delete from S3 when cancelling if exists.
emmiegit Sep 30, 2024
3a7097f
Add echo method for testing / documentation.
emmiegit Sep 30, 2024
89f69cf
Add instructions for testing requsts and doing uploads.
emmiegit Sep 30, 2024
e819c1b
Add test for ProvidedValue serialization as well.
emmiegit Sep 30, 2024
1f79fc1
Remove unneeded #[inline] notations.
emmiegit Sep 30, 2024
04ad6fa
Fix ProvidedValue serialization.
emmiegit Sep 30, 2024
b3b85f4
Better error for missing pending blob.
emmiegit Sep 30, 2024
c094229
Start implementing avatar upload with pending blob pattern.
emmiegit Sep 30, 2024
a693c77
Amend column for job todo.
emmiegit Sep 30, 2024
05803b7
Add configuration field for maximum avatar size.
emmiegit Sep 30, 2024
143c5ad
Use new configuration field in avatar update.
emmiegit Sep 30, 2024
f2cc142
Add new error case for blobs being too large.
emmiegit Sep 30, 2024
ca16de5
Remove full BlobService dead_code ignore.
emmiegit Sep 30, 2024
ee601df
Add log lines for empty revisions.
emmiegit Sep 30, 2024
f370f59
Change file length max to constant.
emmiegit Sep 30, 2024
5286307
Add column for already-moved blobs.
emmiegit Sep 30, 2024
3041dc5
Add initial support for already-moved pending blob.
emmiegit Oct 1, 2024
16aaf3f
Add expected_length column to blob_pending.
emmiegit Oct 1, 2024
c984abd
Add expected blob length to database.
emmiegit Oct 3, 2024
2419237
Add CHECK constraint for file size.
emmiegit Oct 3, 2024
3bdaf5d
Check expected file size in finish_upload().
emmiegit Oct 3, 2024
a257e71
Add maximum-blob-size-kb field to config.
emmiegit Oct 3, 2024
e5e4947
Add field to config struct.
emmiegit Oct 3, 2024
1e0735f
Move size check.
emmiegit Oct 3, 2024
4ff8112
Delete blob if found to mismatch.
emmiegit Oct 3, 2024
2b48ce5
Add check for blobs that are too large.
emmiegit Oct 3, 2024
5f11d0f
Create request.py helper script.
emmiegit Oct 4, 2024
c2bbe37
Add better multiline formatting.
emmiegit Oct 4, 2024
007c022
Add example usage of new request.py script.
emmiegit Oct 4, 2024
a852081
Add script for uploading files to local S3.
emmiegit Oct 6, 2024
b443290
Mention upload script in README.
emmiegit Oct 6, 2024
844fcd5
Add wrapper for move_uploaded(), separate transaction.
emmiegit Oct 6, 2024
e925585
Print JSON as JSON.
emmiegit Oct 6, 2024
e555f7c
Add logic for inline JSON responses.
emmiegit Oct 6, 2024
ec9fe3b
Start JSON data on a separate line.
emmiegit Oct 6, 2024
d57b0b4
Bump deepwell version to v2024.10.6
emmiegit Oct 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deepwell/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion deepwell/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ keywords = ["wikijump", "api", "backend", "wiki"]
categories = ["asynchronous", "database", "web-programming::http-server"]
exclude = [".gitignore", ".editorconfig"]

version = "2024.9.14"
version = "2024.10.6"
authors = ["Emmie Smith <emmie.maeda@gmail.com>"]
edition = "2021"

Expand Down
54 changes: 54 additions & 0 deletions deepwell/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,60 @@ $ cargo fmt # Ensure code is formatted
$ cargo clippy # Check code for lints
```

### Running requests

When you have a local instance of DEEPWELL running, probably in the developement `docker-compose` instance, you may want to run requests against it. You can easily accomplish this with a tool like `curl`. The basic format is:

```sh
$ curl -X POST --json '{"jsonrpc":"2.0","method":"<method here>","params":<json data of request>,"id":<request id>}' http://localhost:2747/jsonrpc
```

Where you pass in the JSONRPC method name and corresponding JSON data. The ID value distinguishes between notices and requests, see the JSONRPC specification for information.

For instance:

```sh
$ curl -X POST --json '{"jsonrpc":"2.0","method":"echo","params":{"my":["json","data"]},"id":0}' http://localhost:2747/jsonrpc

{"jsonrpc":"2.0","id":0,"result":{"my":["json","data"]}}

$ curl -X POST --json '{"jsonrpc":"2.0","method":"ping","id":0}' http://localhost:2747/jsonrpc

{"jsonrpc":"2.0","id":0,"result":"Pong!"}
```

If you are unfamiliar with JSONRPC, you can read about it [on its website](https://www.jsonrpc.org/specification). For instance, one quirk is that for methods which take a non-list or object argument, you specify it as a list of one element.

There is also a helper script to assist with making JSONRPC requests, `scripts/request.py`. It requires the popular [`requests`](https://requests.readthedocs.io/) library to be installed.

Example usage:

```sh
$ scripts/request.py echo '{ "my": ["json","data"] }'
OK {'my': ['json', 'data']}

$ scripts/request.py ping
OK Pong!

$ scripts/request.py error
ERR
{'code': 4000,
'data': None,
'message': 'The request is in some way malformed or incorrect'}
```

**NOTE:** When you are uploading files to local minio as part of testing file upload flows, **you must leave the URL unmodified**. The host `files` is used as the S3 provider, which is a problem since this is not a valid host on your development machine, which necessitates use of `--connect-to` to tell `curl` to connect to the appropriate location instead:

```sh
$ curl --connect-to files:9000:localhost:9000 --upload-file <path-to-file> <s3-presign-url>
```

Alternatively, you can use the helper script:

```sh
$ scripts/upload.sh <path-to-file> <s3-presign-url>
```

### Database

There are two important directories related to the management of the database (which DEEPWELL can be said to "own"). They are both fairly self-explanatory:
Expand Down
22 changes: 22 additions & 0 deletions deepwell/config.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,28 @@ minimum-name-bytes = 3
# Set to 0 to disable.
refill-name-change-days = 90


[file]

# The length of paths used for S3 presigned URLs.
#
# The value doesn't particularly matter so long as it is sufficiently long
# to avoid collisions.
#
# Just to be safe, the generation mechanism is the same as for session tokens.
presigned-path-length = 32

# How long a presigned URL lasts before expiry.
#
# The value should only be a few minutes, and no longer than 12 hours.
presigned-expiration-minutes = 5

# The maximum blob size allowed globally, in KiB.
maximum-blob-size-kb = 1_048_576

# The maximum blob size allowed for user avatars, in KiB.
maximum-avatar-size-kb = 250

[message]

# The maximum size of a message's subject line, in bytes.
Expand Down
22 changes: 21 additions & 1 deletion deepwell/migrations/20220906103252_deepwell.sql
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,26 @@ CREATE TABLE page_vote (
CHECK ((disabled_at IS NULL) = (disabled_by IS NULL))
);

--
-- Blobs
--

-- Manages blobs that are being uploaded by the user
CREATE TABLE blob_pending (
external_id TEXT PRIMARY KEY,
created_by BIGINT NOT NULL REFERENCES "user"(user_id),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
expected_length BIGINT NOT NULL CHECK (expected_length >= 0),
s3_path TEXT NOT NULL CHECK (length(s3_path) > 1),
s3_hash BYTEA, -- NULL means not yet moved, NOT NULL means deleted from s3_path
presign_url TEXT NOT NULL CHECK (length(presign_url) > 1),

CHECK (expires_at > created_at), -- expiration time is not in the relative past
CHECK (length(external_id) = 24), -- default length for a cuid2
CHECK (s3_hash IS NULL OR length(s3_hash) = 64) -- SHA-512 hash size, if present
);

--
-- Files
--
Expand Down Expand Up @@ -514,7 +534,7 @@ CREATE TYPE message_recipient_type AS ENUM (
-- A "record" is the underlying message data, with its contents, attachments,
-- and associated metadata such as sender and recipient(s).
CREATE TABLE message_record (
external_id TEXT PRIMARY KEY,
external_id TEXT PRIMARY KEY, -- ID comes from message_draft
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
drafted_at TIMESTAMP WITH TIME ZONE NOT NULL,
retracted_at TIMESTAMP WITH TIME ZONE,
Expand Down
114 changes: 114 additions & 0 deletions deepwell/scripts/request.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#!/usr/bin/env python3

import argparse
import json
import os
import sys

import requests


def color_settings(value):
match value:
case "auto":
fd = sys.stdout.fileno()
return os.isatty(fd)
case "always":
return True
case "never":
return False


def print_data(data):
if isinstance(data, str):
print(data)
else:
# Only print on multiple lines if it's "large"
output = json.dumps(data)
if len(output) > 16:
output = json.dumps(data, indent=4)
print()
print(output)


def deepwell_request(endpoint, method, data, id=0, color=False):
r = requests.post(
endpoint,
json={
"jsonrpc": "2.0",
"method": method,
"params": data,
"id": id,
},
)

if color:
green_start = "\x1b[32m"
red_start = "\x1b[31m"
color_end = "\x1b[0m"
else:
green_start = ""
red_start = ""
color_end = ""

match r.json():
case {"jsonrpc": "2.0", "id": id, "result": data}:
print(f"{green_start}OK {color_end}", end="")
print_data(data)
return 0
case {"jsonrpc": "2.0", "id": id, "error": data}:
print(f"{red_start}ERR {color_end}", end="")
print_data(data)
return 1


if __name__ == "__main__":
argparser = argparse.ArgumentParser(
"deepwell-request",
description="Helper script to run DEEPWELL JSONRPC requests",
)
argparser.add_argument(
"-H",
"--host",
default="localhost",
)
argparser.add_argument(
"-p",
"--port",
type=int,
default=2747,
)
argparser.add_argument(
"-s",
"--https",
dest="scheme",
action="store_const",
const="https",
default="http",
)
argparser.add_argument(
"-I",
"--id",
default=0,
)
argparser.add_argument(
"-C",
"--color",
choices=["never", "auto", "always"],
default="auto",
)
argparser.add_argument("method")
argparser.add_argument("data", nargs="?", type=json.loads, default="{}")
args = argparser.parse_args()
enable_color = color_settings(args.color)

endpoint = f"{args.scheme}://{args.host}:{args.port}/jsonrpc"
exit_code = deepwell_request(
endpoint,
args.method,
args.data,
args.id,
color=enable_color,
)

sys.exit(exit_code)
27 changes: 27 additions & 0 deletions deepwell/scripts/upload.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash
set -eu

#
# Helper script to upload files to a local S3 store for testing upload flows.
#

if [[ $# -ne 2 ]]; then
echo >&2 "Usage: $0 <path-to-file> <s3-presign-url>"
exit 1
fi

# Allow either order of arguments, for convenience.
# If it starts with HTTP or HTTPS, we assume it's the presign URL.
if [[ $1 = http:* || $1 = https:* ]]; then
path="$2"
url="$1"
else
path="$1"
url="$2"
fi

exec \
curl \
--connect-to 'files:9000:localhost:9000' \
--upload-file "$path" \
"$url"
13 changes: 8 additions & 5 deletions deepwell/src/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@

use crate::config::{Config, Secrets};
use crate::endpoints::{
auth::*, category::*, domain::*, email::*, file::*, file_revision::*, link::*,
locale::*, message::*, misc::*, page::*, page_revision::*, parent::*, site::*,
site_member::*, text::*, user::*, user_bot::*, view::*, vote::*,
auth::*, blob::*, category::*, domain::*, email::*, file::*, file_revision::*,
link::*, locale::*, message::*, misc::*, page::*, page_revision::*, parent::*,
site::*, site_member::*, text::*, user::*, user_bot::*, view::*, vote::*,
};
use crate::locales::Localizations;
use crate::services::blob::MimeAnalyzer;
Expand Down Expand Up @@ -174,6 +174,7 @@ async fn build_module(app_state: ServerState) -> anyhow::Result<RpcModule<Server

// Miscellaneous
register!("ping", ping);
register!("echo", echo);
register!("error", yield_error);
register!("version", version);
register!("version_full", full_version);
Expand Down Expand Up @@ -260,11 +261,13 @@ async fn build_module(app_state: ServerState) -> anyhow::Result<RpcModule<Server

// Blob data
register!("blob_get", blob_get);
register!("blob_upload", blob_upload);
register!("blob_cancel", blob_cancel);

// Files
register!("file_upload", file_upload);
register!("file_get", file_get);
register!("file_create", file_create);
register!("file_edit", file_edit);
register!("file_get", file_get);
register!("file_delete", file_delete);
register!("file_move", file_move);
register!("file_restore", file_restore);
Expand Down
22 changes: 22 additions & 0 deletions deepwell/src/config/file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ pub struct ConfigFile {
ftml: Ftml,
special_pages: SpecialPages,
user: User,
file: FileSection,
message: Message,
}

Expand Down Expand Up @@ -181,6 +182,16 @@ struct User {
minimum_name_bytes: usize,
}

// NOTE: Name conflict with std::fs::File
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(rename_all = "kebab-case")]
struct FileSection {
presigned_path_length: usize,
presigned_expiration_minutes: u32,
maximum_blob_size_kb: i64,
maximum_avatar_size_kb: i64,
}

#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(rename_all = "kebab-case")]
struct Message {
Expand Down Expand Up @@ -303,6 +314,13 @@ impl ConfigFile {
refill_name_change_days,
minimum_name_bytes,
},
file:
FileSection {
presigned_path_length,
presigned_expiration_minutes,
maximum_blob_size_kb,
maximum_avatar_size_kb,
},
message:
Message {
maximum_subject_bytes: maximum_message_subject_bytes,
Expand Down Expand Up @@ -424,6 +442,10 @@ impl ConfigFile {
))
},
minimum_name_bytes,
presigned_path_length,
presigned_expiry_secs: presigned_expiration_minutes * 60,
maximum_blob_size: maximum_blob_size_kb * 1024,
maximum_avatar_size: maximum_avatar_size_kb * 1024,
maximum_message_subject_bytes,
maximum_message_body_bytes,
maximum_message_recipients,
Expand Down
12 changes: 12 additions & 0 deletions deepwell/src/config/object.rs
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,18 @@ pub struct Config {
/// Minimum length of bytes in a username.
pub minimum_name_bytes: usize,

/// Length of randomly-generated portion of S3 presigned URLs.
pub presigned_path_length: usize,

/// How long S3 presigned URLs will last before expiry.
pub presigned_expiry_secs: u32,

/// Maximum size of a blob globally.
pub maximum_blob_size: i64,

/// Maximum size of a user's avatar image.
pub maximum_avatar_size: i64,

/// Maximum size of the subject line allowed in a direct message.
pub maximum_message_subject_bytes: usize,

Expand Down
Loading
Loading