-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[contents] Add server side caching for all requests (If-Modified-Since) #889
[contents] Add server side caching for all requests (If-Modified-Since) #889
Conversation
I would suggest to add diff --git a/lib/contents.php b/lib/contents.php
index de36bbd..5493123 100644
--- a/lib/contents.php
+++ b/lib/contents.php
@@ -1,13 +1,26 @@
<?php
-function getContents($url, $header = array(), $opts = array()){
+function getContents($url, $header = array(), $opts = array(), $cache_opts = array()){
+ $cache_opts = array_merge(array(
+ 'path' => CACHE_DIR . '/server',
+ 'purge_cache_time' => 86400, // 24 hours
+ 'force_cache' => false,
+ ), $cache_opts);
+
// Initialize cache
$cache = Cache::create('FileCache');
- $cache->setPath(CACHE_DIR . '/server');
- $cache->purgeCache(86400); // 24 hours (forced)
+ $cache->setPath($cache_opts['path']);
+ $cache->purgeCache($cache_opts['purge_cache_time']);
$params = [$url];
$cache->setParameters($params);
+ if ($cache_opts['force_cache']) {
+ $result = $cache->loadData();
+ if (!is_null($result)) {
+ return $result;
+ }
+ }
+
debugMessage('Reading contents from "' . $url . '"');
$ch = curl_init($url); One of the usecases - fetch youtube videos (to get upload dates) from large unordered playlists. With combinantion of certain changes it can fully fix #647 (which is incorrectly closed, should be reopened). Example usage (run more than 1 time to see latency difference): <?php
ini_set('display_errors', '1');
error_reporting(E_ALL);
define('DEBUG', true);
require_once __DIR__ . '/lib/RssBridge.php';
define('CACHE_DIR', __DIR__ . '/cache');
cache::setDir(__DIR__ . '/caches/');
echo getContents("https://www.youtube.com/watch?v=3QwR8FBhq3Q", [], [], [
'force_cache' => true,
'path' => CACHE_DIR . '/youtube',
'purge_cache_time' => 60*60*24*2 // 48 hours
]); |
Please correct me if I'm wrong. What you are asking for is basically a function similar to |
Thanks for getSimpleHTMLDOMCached. Didn't know about that. Ignore my previous suggestion. |
This commit adds a cache for 'getContents' to '/cache/server'. All contents are cached by default (even in debug mode). If debug mode is enabled, the cached data is overwritten on each request. In normal mode RSS-Bridge adds the 'If-Modified-Since' header with the timestamp from the previously cached data (if available) to the request. Find more information on 'If-Modified-Since' here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since If the server responds with "304 Not Modified", the cached data is returned. If the server responds with "200 OK", the received data is written to the cache (creates a new cache file if it doesn't exist yet). No changes were made for all other response codes. Servers that don't support the 'If-Modified-Since' header, will respond with "200 OK". For servers that respond with "304 Not Modified", the required band- width will decrease and RSS-Bridge will responding faster. Files in the cache are forcefully removed after 24 hours. Notice: Only few servers actually do support 'If-Modified-Since'. Thus, most bridges won't be affected by this change.
…e) (RSS-Bridge#889) This commit adds a cache for 'getContents' to '/cache/server'. All contents are cached by default (even in debug mode). If debug mode is enabled, the cached data is overwritten on each request. In normal mode RSS-Bridge adds the 'If-Modified-Since' header with the timestamp from the previously cached data (if available) to the request. Find more information on 'If-Modified-Since' here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since If the server responds with "304 Not Modified", the cached data is returned. If the server responds with "200 OK", the received data is written to the cache (creates a new cache file if it doesn't exist yet). No changes were made for all other response codes. Servers that don't support the 'If-Modified-Since' header, will respond with "200 OK". For servers that respond with "304 Not Modified", the required band- width will decrease and RSS-Bridge will responding faster. Files in the cache are forcefully removed after 24 hours. Notice: Only few servers actually do support 'If-Modified-Since'. Thus, most bridges won't be affected by this change.
This PR adds a cache for 'getContents' to '/cache/server'. All contents are cached by default (even in debug mode). If debug mode is enabled, the cached data is overwritten on each request.
In normal mode RSS-Bridge adds the 'If-Modified-Since' header with the timestamp from the previously cached data (if available) to the request.
Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
For servers that respond with "304 Not Modified", the required bandwidth will decrease and RSS-Bridge will responding faster.
Files in the cache are forcefully removed after 24 hours.
Notice: Only few servers actually do support 'If-Modified-Since'. Thus, most bridges won't be affected by this change.
I have only tested a few bridges (maybe 10) and so far only "Bastamag" and "Bundesbank" are responding with "304 Not Modified".
I did some timing for "Bundesbank" on 10 consecutive requests (debug mode enabled, but not skipping 'If-Modified-Since'). Compared to the current master it shows an improvement of two seconds (approx. 7 seconds on master, 5 seconds on this PR). Bridges that load more contents might show even better results.
Let me know if you have any suggestion for improvement.