-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegram content proxy #163
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
9eb8f30
(#102) Web: add a dependency on Telegram link resolver
ForNeVeR afbe2f8
(#102) ContentProxy: add a FileCache
ForNeVeR 800546a
(#102) ContentProxy: finally, make it compile
ForNeVeR 3e7b422
(#102) FileCacheTests: preliminary test API
ForNeVeR 97db22a
(#102) TestFramework: extract the code from TestUtils
ForNeVeR 4d6bfc5
(#102) ContentProxy: finish working FileCache
ForNeVeR a5e26a1
(#102) FileCacheTests: implement an ordering test
ForNeVeR 6ea4892
(#102) FileCache: cache directory validation tests
ForNeVeR cb218b6
(#102) FileCache: additional tests
ForNeVeR b50d615
(#102) FileCache: finish the last tests
ForNeVeR 9100471
(#102) ContentController: test redirect mode
ForNeVeR 107c4be
(#102) ContentController: last test groundwork
ForNeVeR e8e8153
(#102) FileCache: async stream optimization
ForNeVeR 067da2d
(#102) ContentController: add last tests
ForNeVeR 9292428
(#102) ContentController: make it work in manual tests
ForNeVeR b02512c
(#102) ContentProxy: some small fixes
ForNeVeR 5d954d6
(#102) ContentProxy: add file names and MIME types
ForNeVeR 3977248
(#102) FileCache: support older versions of Windows
ForNeVeR fb5dc3a
Docs: a slight improvement
ForNeVeR a58f54e
(#102) FileCache: drop redundant rec
ForNeVeR b2cccee
(#102) FileCache: improve the workarounds for the older versions of W…
ForNeVeR 2861ee8
(#102) ContentProxy: redesign the attribute optionality
ForNeVeR 7936682
(#102) Settings: update the example
ForNeVeR File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,218 @@ | ||
namespace Emulsion.ContentProxy | ||
|
||
open System | ||
open System.IO | ||
open System.Net.Http | ||
open System.Security.Cryptography | ||
open System.Text | ||
open System.Threading | ||
|
||
open JetBrains.Collections.Viewable | ||
open Serilog | ||
open SimpleBase | ||
|
||
open Emulsion.Settings | ||
|
||
type DownloadRequest = { | ||
Uri: Uri | ||
CacheKey: string | ||
Size: uint64 | ||
} | ||
|
||
module Base58 = | ||
/// Suggested by @ttldtor. | ||
let M4N71KR = Base58(Base58Alphabet "123456789qwertyuiopasdfghjkzxcvbnmQWERTYUPASDFGHJKLZXCVBNM") | ||
|
||
module FileCache = | ||
let EncodeFileName(sha256: SHA256, cacheKey: string): string = | ||
cacheKey | ||
|> Encoding.UTF8.GetBytes | ||
|> sha256.ComputeHash | ||
|> Base58.M4N71KR.Encode | ||
|
||
let TryDecodeFileNameToSha256Hash(fileName: string): byte[] option = | ||
try | ||
Some <| (Base58.M4N71KR.Decode fileName).ToArray() | ||
with | ||
| :? ArgumentException -> None | ||
|
||
let IsMoveAndDeleteModeEnabled = | ||
// NOTE: On older versions of Windows (known to reproduce on windows-2019 GitHub Actions image), the following | ||
// scenario may be defunct: | ||
// | ||
// - open a file with FileShare.Delete (i.e. for download) | ||
// - delete a file (i.e. during the cache cleanup) | ||
// - try to create a file with the same name again | ||
// | ||
// According to this article | ||
// (https://boostgsoc13.github.io/boost.afio/doc/html/afio/FAQ/deleting_open_files.html), it is impossible to do | ||
// since file will occupy its disk name until the last handle is closed. | ||
// | ||
// In practice, this is allowed (checked at least on Windows 10 20H2 and windows-2022 GitHub Actions image), but | ||
// some tests are known to be broken on older versions of Windows (windows-2019). | ||
// | ||
// As a workaround, let's rename the file to a random name before deleting it. | ||
// | ||
// This workaround may be removed after these older versions of Windows goes out of support. | ||
OperatingSystem.IsWindows() | ||
|
||
type FileCache(logger: ILogger, | ||
settings: FileCacheSettings, | ||
httpClientFactory: IHttpClientFactory, | ||
sha256: SHA256) = | ||
|
||
let error = Signal<Exception>() | ||
|
||
let getFilePath(cacheKey: string) = | ||
Path.Combine(settings.Directory, FileCache.EncodeFileName(sha256, cacheKey)) | ||
|
||
let readFileOptions = | ||
FileStreamOptions(Mode = FileMode.Open, Access = FileAccess.Read, Options = FileOptions.Asynchronous, Share = (FileShare.Read ||| FileShare.Delete)) | ||
|
||
let writeFileOptions = | ||
FileStreamOptions(Mode = FileMode.CreateNew, Access = FileAccess.Write, Options = FileOptions.Asynchronous, Share = FileShare.None) | ||
|
||
let getFromCache(cacheKey: string) = async { | ||
let path = getFilePath cacheKey | ||
return | ||
if File.Exists path then | ||
Some(new FileStream(path, readFileOptions)) | ||
else | ||
None | ||
} | ||
|
||
let enumerateCacheFiles() = | ||
let entries = Directory.EnumerateFileSystemEntries settings.Directory | ||
if FileCache.IsMoveAndDeleteModeEnabled then | ||
entries |> Seq.filter(fun p -> not(p.EndsWith ".deleted")) | ||
else | ||
entries | ||
|
||
let deleteFileSafe (fileInfo: FileInfo) = async { | ||
if FileCache.IsMoveAndDeleteModeEnabled then | ||
fileInfo.MoveTo(Path.Combine(fileInfo.DirectoryName, $"{Guid.NewGuid().ToString()}.deleted")) | ||
fileInfo.Delete() | ||
else | ||
fileInfo.Delete() | ||
} | ||
|
||
let assertCacheDirectoryExists() = async { | ||
Directory.CreateDirectory settings.Directory |> ignore | ||
} | ||
|
||
let assertCacheValid() = async { | ||
enumerateCacheFiles() | ||
|> Seq.iter(fun entry -> | ||
let entryName = Path.GetFileName entry | ||
|
||
if not <| File.Exists entry | ||
then failwith $"Cache directory invalid: contains a subdirectory \"{entryName}\"." | ||
|
||
match FileCache.TryDecodeFileNameToSha256Hash entryName with | ||
| Some hash when hash.Length = sha256.HashSize / 8 -> () | ||
| _ -> | ||
failwith ( | ||
$"Cache directory invalid: contains an entry \"{entryName}\" which doesn't correspond to a " + | ||
"base58-encoded SHA-256 hash." | ||
) | ||
) | ||
} | ||
|
||
let ensureFreeCache size = async { | ||
if size > settings.FileSizeLimitBytes || size > settings.TotalCacheSizeLimitBytes then | ||
return false | ||
else | ||
do! assertCacheDirectoryExists() | ||
do! assertCacheValid() | ||
|
||
let allEntries = enumerateCacheFiles() |> Seq.map FileInfo | ||
|
||
// Now, sort the entries from newest to oldest, and start deleting if required at a point when we understand | ||
// that there are too much files: | ||
let entriesByPriority = | ||
allEntries | ||
|> Seq.sortByDescending(fun info -> info.LastWriteTimeUtc) | ||
|> Seq.toArray | ||
|
||
let mutable currentSize = 0UL | ||
for info in entriesByPriority do | ||
currentSize <- currentSize + Checked.uint64 info.Length | ||
if currentSize + size > settings.TotalCacheSizeLimitBytes then | ||
logger.Information("Deleting a cache item \"{FileName}\" ({Size} bytes)", info.Name, info.Length) | ||
do! deleteFileSafe info | ||
|
||
return true | ||
} | ||
|
||
let download(uri: Uri): Async<Stream> = async { | ||
let! ct = Async.CancellationToken | ||
|
||
use client = httpClientFactory.CreateClient() | ||
let! response = Async.AwaitTask <| client.GetAsync(uri, ct) | ||
return! Async.AwaitTask <| response.EnsureSuccessStatusCode().Content.ReadAsStreamAsync() | ||
} | ||
|
||
let downloadIntoCacheAndGet uri cacheKey: Async<Stream> = async { | ||
let! ct = Async.CancellationToken | ||
let! stream = download uri | ||
let path = getFilePath cacheKey | ||
logger.Information("Saving {Uri} to path {Path}…", uri, path) | ||
|
||
do! async { // to limit the cachedFile scope | ||
use cachedFile = new FileStream(path, writeFileOptions) | ||
do! Async.AwaitTask(stream.CopyToAsync(cachedFile, ct)) | ||
logger.Information("Download successful: \"{Uri}\" to \"{Path}\".", uri, path) | ||
} | ||
|
||
let! file = getFromCache cacheKey | ||
return upcast Option.get file | ||
} | ||
|
||
let cancellation = new CancellationTokenSource() | ||
let processRequest request: Async<Stream> = async { | ||
logger.Information("Cache lookup for content {Uri} (cache key {CacheKey})", request.Uri, request.CacheKey) | ||
match! getFromCache request.CacheKey with | ||
| Some content -> | ||
logger.Information("Cache hit for content {Uri} (cache key {CacheKey})", request.Uri, request.CacheKey) | ||
return content | ||
| None -> | ||
logger.Information("No cache hit for content {Uri} (cache key {CacheKey}), will download", request.Uri, request.CacheKey) | ||
let! shouldCache = ensureFreeCache request.Size | ||
if shouldCache then | ||
logger.Information("Resource {Uri} (cache key {CacheKey}, {Size} bytes) will fit into cache, caching", request.Uri, request.CacheKey, request.Size) | ||
let! result = downloadIntoCacheAndGet request.Uri request.CacheKey | ||
logger.Information("Resource {Uri} (cache key {CacheKey}, {Size} bytes) downloaded", request.Uri, request.CacheKey, request.Size) | ||
return result | ||
else | ||
logger.Information("Resource {Uri} (cache key {CacheKey}) won't fit into cache, directly downloading", request.Uri, request.CacheKey) | ||
let! result = download request.Uri | ||
return result | ||
} | ||
|
||
let processLoop(processor: MailboxProcessor<_ * AsyncReplyChannel<_>>) = async { | ||
while true do | ||
let! request, replyChannel = processor.Receive() | ||
try | ||
let! result = processRequest request | ||
replyChannel.Reply(Some result) | ||
with | ||
| ex -> | ||
logger.Error(ex, "Exception while processing the file download queue") | ||
error.Fire ex | ||
replyChannel.Reply None | ||
} | ||
let processor = MailboxProcessor.Start(processLoop, cancellation.Token) | ||
|
||
interface IDisposable with | ||
member _.Dispose() = | ||
cancellation.Dispose() | ||
(processor :> IDisposable).Dispose() | ||
|
||
member _.Download(uri: Uri, cacheKey: string, size: uint64): Async<Stream option> = | ||
processor.PostAndAsyncReply(fun chan -> ({ | ||
Uri = uri | ||
CacheKey = cacheKey | ||
Size = size | ||
}, chan)) | ||
|
||
member _.Error: ISource<Exception> = error |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
namespace Emulsion.ContentProxy | ||
|
||
open System.Net.Http | ||
|
||
type SimpleHttpClientFactory() = | ||
interface IHttpClientFactory with | ||
member this.CreateClient _ = new HttpClient() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,4 +8,6 @@ type TelegramContent = { | |
ChatUserName: string | ||
MessageId: int64 | ||
FileId: string | ||
FileName: string | ||
MimeType: string | ||
} |
91 changes: 91 additions & 0 deletions
91
Emulsion.Database/Migrations/20220828133844_ContentFileNameAndMimeType.fs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
// <auto-generated /> | ||
namespace Emulsion.Database.Migrations | ||
|
||
open System | ||
open Emulsion.Database | ||
open Microsoft.EntityFrameworkCore | ||
open Microsoft.EntityFrameworkCore.Infrastructure | ||
open Microsoft.EntityFrameworkCore.Migrations | ||
|
||
[<DbContext(typeof<EmulsionDbContext>)>] | ||
[<Migration("20220828133844_ContentFileNameAndMimeType")>] | ||
type ContentFileNameAndMimeType() = | ||
inherit Migration() | ||
|
||
override this.Up(migrationBuilder:MigrationBuilder) = | ||
migrationBuilder.AddColumn<string>( | ||
name = "FileName" | ||
,table = "TelegramContents" | ||
,``type`` = "TEXT" | ||
,nullable = true | ||
,defaultValue = "file.bin" | ||
) |> ignore | ||
|
||
migrationBuilder.AddColumn<string>( | ||
name = "MimeType" | ||
,table = "TelegramContents" | ||
,``type`` = "TEXT" | ||
,nullable = true | ||
,defaultValue = "application/octet-stream" | ||
) |> ignore | ||
|
||
migrationBuilder.Sql @" | ||
drop index TelegramContents_Unique; | ||
|
||
create unique index TelegramContents_Unique | ||
on TelegramContents(ChatUserName, MessageId, FileId, FileName, MimeType) | ||
" |> ignore | ||
|
||
|
||
override this.Down(migrationBuilder:MigrationBuilder) = | ||
migrationBuilder.DropColumn( | ||
name = "FileName" | ||
,table = "TelegramContents" | ||
) |> ignore | ||
|
||
migrationBuilder.DropColumn( | ||
name = "MimeType" | ||
,table = "TelegramContents" | ||
) |> ignore | ||
|
||
migrationBuilder.Sql @" | ||
drop index TelegramContents_Unique; | ||
|
||
create unique index TelegramContents_Unique | ||
on TelegramContents(ChatUserName, MessageId, FileId) | ||
" |> ignore | ||
|
||
|
||
override this.BuildTargetModel(modelBuilder: ModelBuilder) = | ||
modelBuilder | ||
.HasAnnotation("ProductVersion", "5.0.10") | ||
|> ignore | ||
|
||
modelBuilder.Entity("Emulsion.Database.Entities.TelegramContent", (fun b -> | ||
|
||
b.Property<Int64>("Id") | ||
.IsRequired(true) | ||
.ValueGeneratedOnAdd() | ||
.HasColumnType("INTEGER") |> ignore | ||
b.Property<string>("ChatUserName") | ||
.IsRequired(false) | ||
.HasColumnType("TEXT") |> ignore | ||
b.Property<string>("FileId") | ||
.IsRequired(false) | ||
.HasColumnType("TEXT") |> ignore | ||
b.Property<string>("FileName") | ||
.IsRequired(false) | ||
.HasColumnType("TEXT") |> ignore | ||
b.Property<Int64>("MessageId") | ||
.IsRequired(true) | ||
.HasColumnType("INTEGER") |> ignore | ||
b.Property<string>("MimeType") | ||
.IsRequired(false) | ||
.HasColumnType("TEXT") |> ignore | ||
|
||
b.HasKey("Id") |> ignore | ||
|
||
b.ToTable("TelegramContents") |> ignore | ||
|
||
)) |> ignore | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have several places with
while true
in async code already, and they seem non-problematic for now. I think I'll leave that as-is, but will keep an eye on it.