Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

91 add endpoint for image proxying #100

Closed
wants to merge 4 commits into from

Conversation

mickvandijke
Copy link
Member

@mickvandijke mickvandijke commented Mar 29, 2023

closes #91 .

With this PR we can let clients proxy torrent description images through the backend.

How it works

The following endpoint has been added to the backend /proxy/image/{original_image_url_urlencoded}

This endpoint accepts the optional header "Authentication": "Bearer TOKEN"

Authenticated users

When an authenticated user requests an image by its url from the backend proxy, the backend will first check whether the image is already in a local cache and then serve it to the user. If it is not in the local cache already, the backend will request the image from the supplied url and then store it in the local cache.

Unauthenticated users

Guests can only request images that are already in the cache.

Quota

Too mitigate spam, the proxy endpoint is limited by a quota per user. Whenever an authenticated user requests an image that is not already in the local cache, the size of the requested image will be deducted from the user's quota.

When the quota is met, the user will only be able to request images that are already in the cache.

The quota resets after a certain time period that can be set in the config.

Error feedback

In the case of an error, such as (but not limited to):

  • Invalid image url
  • Too big images
  • User quota met

The backend will return an image with the error text in red.

Configuration

[cache]
image_cache_max_request_timeout_ms = 1000 // max timeout when requesting an image from its external url
image_cache_capacity = 128_000_000 // max total cache size in bytes
image_cache_entry_size_limit = 4_000_000 // max size for a single image in the cache
image_cache_user_quota_period_seconds = 3600 // time between quota resets in seconds
image_cache_user_quota_bytes = 64_000_000 // image size quota in bytes per user

@mickvandijke mickvandijke linked an issue Mar 29, 2023 that may be closed by this pull request
@mickvandijke mickvandijke force-pushed the 91-add-endpoint-for-image-proxying branch from ac26149 to 3e11c83 Compare April 6, 2023 20:45
Copy link
Member

@josecelano josecelano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @WarmBeer looks good. I would extract some methods. I've added some suggestions but nothing critical to stopping it from merging.

Comment on lines +42 to +48
let image_cache_manager_config = ImageCacheManagerConfig {
max_image_request_timeout_ms,
max_image_size,
user_quota_period_seconds,
user_quota_bytes,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @WarmBeer, why don't you use torrust_index_backend::config::Cache, something like:

let cache_config = cfg.settings.get_cache_configuration().await;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @josecelano , I see the image cache as an isolated plugin to the backend. This is why I thought that it should own its own config struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you mean it could have its crate without specifying how the configuration should be stored in a toml file, do you? I suppose that kind of extension should also provide a specification for a config file section. But maybe they could also use an independent config file. Anyway, for the time being, config.settings.get_cache_configuration could return an ImageCacheManagerConfig. I think that's easy to change in the future if you extract the extension.

src/cache/cache.rs Show resolved Hide resolved
EntrySizeLimitExceedsTotalCapacity,
BytesExceedEntrySizeLimit,
CacheCapacityIsTooSmall,
CacheFull,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer It seems CacheFull is not being used. Anyway, it's full for ...? I assume it's full for adding a new entry.

}

// With a limit for individual entry sizes.
pub fn with_entry_size_limit(esl: usize) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer Generally, I prefer to be explicit with entry_size_limit instead of esl. Unless the abbreviation is really common, like cfg, ...

new
}

// With bot a total capacity limit and an individual entry size limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both instead of bot

Comment on lines +67 to +70
pub fn met(&self) -> bool {
self.usage >= self.max_usage
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer I think "reached" is more common than "met", but it's OK.


/// Get an image from the url and insert it into the cache if it isn't cached already.
/// Unauthenticated users can only get already cached images.
pub async fn get_image_by_url(&self, url: &str, opt_user: Option<UserCompact>) -> Result<Bytes, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer some people use "maybe_" prefix instead of "opt_". It makes more sense to me. "opt" sounds like a configuration option. However, "maybe_" is only used in Rust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josecelano , I can't find a naming convention for this in Rust anywhere. I have come across code using the opt prefix some time ago and I just stuck with that I guess.

There is a Reddit thread that discusses this naming convention: https://www.reddit.com/r/rust/comments/9hpyam/question_what_names_do_you_use_for_option_and/ .

Both prefixes maybe and opt seem to be used.

I don't have a preference for any over the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josecelano ,

Thank you for the review! I have given my thoughts and put a thumbs up were I fully agree with you and will update things. I think however that we should merge this now and then I will make these changes later on as other tasks take priority currently. Is that ok?

Yes, it's OK.

Comment on lines +147 to +148
// TODO: Update the cache on a separate thread, so that the client does not have to wait.
// Update image cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer, maybe you can split this function into two parts here. You can return the bytes, return the images to the client and later call another method with the rest of the tasks: update cache, quota, etcetera. I think that's very common in web frameworks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also extract a method for each task: request a remote image, update cache, update quota, etcetera.

pub image_cache_user_quota_period_seconds: u64,
pub image_cache_user_quota_bytes: usize,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TorrustConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarmBeer rename to TorrustIndexBackendConfig? However, I think it's not configuration but a kind of AppContainer with tall the application service dependencies.

Comment on lines +69 to +78
Err(e) => unsafe {
// Handling status codes in the frontend other tan OK is quite a pain.
// Return OK for now.
let (_status_code, error_image_bytes): (StatusCode, Bytes) = match e {
Error::UrlIsUnreachable => (StatusCode::GATEWAY_TIMEOUT, ERROR_IMAGE_URL_IS_UNREACHABLE.clone()),
Error::UrlIsNotAnImage => (StatusCode::BAD_REQUEST, ERROR_IMAGE_URL_IS_NOT_AN_IMAGE.clone()),
Error::ImageTooBig => (StatusCode::BAD_REQUEST, ERROR_IMAGE_TOO_BIG.clone()),
Error::UserQuotaMet => (StatusCode::TOO_MANY_REQUESTS, ERROR_IMAGE_USER_QUOTA_MET.clone()),
Error::Unauthenticated => (StatusCode::UNAUTHORIZED, ERROR_IMAGE_UNAUTHENTICATED.clone()),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @WarmBeer why is this unsafe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josecelano ,

All the error image bytes are stored in mutable static variables. Reading these variables is unsafe in Rust, because you don't know what other threads do with them. Since the variables are instantiated in a once_cell, we can consider them safe as there are no other places where they would be mutated.

I do not like using static and unsafe code, so I will update this in the future.

@mickvandijke
Copy link
Member Author

Hey @josecelano ,

Thank you for the review! I have given my thoughts and put a thumbs up were I fully agree with you and will update things. I think however that we should merge this now and then I will make these changes later on as other tasks take priority currently. Is that ok?

Comment on lines +157 to +166
#[tokio::test]
async fn set_multiple_bytes_cache_with_capacity_and_entry_size_limit_should_have_len_of_two() {
let mut bytes_cache = BytesCache::with_capacity_and_entry_size_limit(12, 6).unwrap();
let bytes: Bytes = Bytes::from("abcdef");

assert!(bytes_cache.set("1".to_string(), bytes.clone()).await.is_ok());
assert!(bytes_cache.set("2".to_string(), bytes).await.is_ok());

assert_eq!(bytes_cache.len().await, 2)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WarmBeer, I'd refactor this test to:

    #[tokio::test]
    async fn given_a_bytes_cache_with_a_capacity_and_entry_size_limit_it_should_allow_adding_new_entries_if_the_limit_is_not_exceeded() {
 
        let bytes: Bytes = Bytes::from("abcdef");

        let mut bytes_cache = BytesCache::with_capacity_and_entry_size_limit(bytes.len() * 2, bytes.len()).unwrap();

        // Add first entry (6 bytes)
        assert!(bytes_cache.set("key1".to_string(), bytes.clone()).await.is_ok());

        // Add second entry (6 bytes)
        assert!(bytes_cache.set("key2".to_string(), bytes).await.is_ok());

        // Both entries were added because we did not reach the limit
        assert_eq!(bytes_cache.len().await, 2)
    }

If what we are testing is the cache limit.

Comment on lines +42 to +48
let image_cache_manager_config = ImageCacheManagerConfig {
max_image_request_timeout_ms,
max_image_size,
user_quota_period_seconds,
user_quota_bytes,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you mean it could have its crate without specifying how the configuration should be stored in a toml file, do you? I suppose that kind of extension should also provide a specification for a config file section. But maybe they could also use an independent config file. Anyway, for the time being, config.settings.get_cache_configuration could return an ImageCacheManagerConfig. I think that's easy to change in the future if you extract the extension.


/// Get an image from the url and insert it into the cache if it isn't cached already.
/// Unauthenticated users can only get already cached images.
pub async fn get_image_by_url(&self, url: &str, opt_user: Option<UserCompact>) -> Result<Bytes, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josecelano ,

Thank you for the review! I have given my thoughts and put a thumbs up were I fully agree with you and will update things. I think however that we should merge this now and then I will make these changes later on as other tasks take priority currently. Is that ok?

Yes, it's OK.

@da2ce7 da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Apr 20, 2023
@da2ce7 da2ce7 force-pushed the 91-add-endpoint-for-image-proxying branch 2 times, most recently from 8dd997b to 83afc51 Compare April 25, 2023 15:32
@da2ce7 da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label Apr 25, 2023
mickvandijke added a commit that referenced this pull request May 4, 2023
916d869 refactor: remove patch versions from added packages (Warm Beer)
d5c5487 refactor: cargo fmt (Warm Beer)
005817f feat: [#91] added image proxy with cache (Warm Beer)

Pull request description:

  Rebased and processed @josecelano 's feedback: #100
@josecelano
Copy link
Member

@WarmBeer created a new PR for this feature and it was already merged.

@josecelano josecelano closed this May 6, 2023
@da2ce7 da2ce7 deleted the 91-add-endpoint-for-image-proxying branch September 21, 2023 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add endpoint for image proxying
3 participants