Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching doesn't seem to work properly #290

Closed
agneevX opened this issue Sep 25, 2021 · 17 comments · Fixed by #295
Closed

Caching doesn't seem to work properly #290

agneevX opened this issue Sep 25, 2021 · 17 comments · Fixed by #295
Assignees
Labels
🐞 bug Something isn't working
Milestone

Comments

@agneevX
Copy link
Contributor

agneevX commented Sep 25, 2021

I have caching/prefetching set up pretty generously, however it seems that some queries are slow to look up when it should already be cached.

A624CA4B-024E-4E27-823D-9216D620C325

logLevel: warn

upstream:
  default:
    - https://8.8.8.8/dns-query
    - https://8.8.4.4/dns-query

caching:
  minTime: 5m
  maxTime: 30m
  prefetching: true
  prefetchExpires: 24h
  prefetchThreshold: 2
@0xERR0R
Copy link
Owner

0xERR0R commented Sep 25, 2021

Do you have any log entries or do you use csv or database logger? You can only observe a longer time (80ms), but it doesn't mean that this query wasn't cached.

@agneevX
Copy link
Contributor Author

agneevX commented Sep 25, 2021

I don't use any type of logging due to this being run on a Raspberry Pi w/ SD card.

@agneevX
Copy link
Contributor Author

agneevX commented Sep 25, 2021

Does that mean high response times are possible even though it's cached? Shouldn't it be sub 1ms?

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 25, 2021

I think, it depends on your network and server load. I have 1 - 80ms for cached entries. I use only udp dns. If you use doh, there is more overhead for encryption

@kwitsch
Copy link
Collaborator

kwitsch commented Sep 25, 2021

I tested the logging issue with a test stack.
test-stack blocky-config
The min TTL is set to one minute and the same url is requested every two minutes.

It appears that after the prefetch initialisation every third request isen't delivered from cache.
docker log snippet

@agneevX
Copy link
Contributor Author

agneevX commented Sep 26, 2021

@kwitsch thanks for the investigation! That’s a neat app you have there 👍

@kwitsch
Copy link
Collaborator

kwitsch commented Sep 26, 2021

@0xERR0R thanks, the app is used in my current DNS stack to keep often but infrequently used domains cached.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 26, 2021

I tried to reproduce this behavior. There is a small gap between the log entry expiration and prefetch loading. Let me explain, how it works:
Blocky has a cache with cached entries. Each entry has an expiration time (which is the TTL for the response). If an entry expires, a prefetch will be performed (the entry will be refreshed). In the cache implementation, there is an async routine, which runs every 15s. It looks for expired entries, removes the from the cache and triggers the prefetch.

If you query for the log entry, which is already expired but was not collected for the remove yet, the entry will be resolved regularry without caching.

Here an example:

Config:

caching:
  minTime: 10s
  maxTime: 10s
  prefetching: true
  prefetchExpires: 2h
  prefetchThreshold: 2

Every entry will be cached exact 10 s.

I extended the code with log output, it the prefetch is triggered.

Now run dig youtube.com every second:

[2021-09-26 16:51:06]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=229 question=A (youtube.com.) response_code=NOERROR response_reason=RESOLVED (https://8.8.8.8:443/dns-query)
[2021-09-26 16:51:07]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:08]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:09]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:10]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:11]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:12]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:13]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:14]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:15]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:16]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=32 question=A (youtube.com.) response_code=NOERROR response_reason=RESOLVED (https://8.8.8.8:443/dns-query)
[2021-09-26 16:51:17]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:18]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:19]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:20]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:21]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 16:51:22]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED

You can see, after 10 seconds, the entry was reloaded, because the cache entry was expired but prefetch was not executed because the clean up job was not triggered.

Now I run the query every 15 seconds:

[2021-09-26 17:02:52]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 17:02:57]  INFO PREFETCH
[2021-09-26 17:03:07]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 17:03:12]  INFO PREFETCH
[2021-09-26 17:03:22]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 17:03:27]  INFO PREFETCH
[2021-09-26 17:03:37]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED
[2021-09-26 17:03:42]  INFO PREFETCH
[2021-09-26 17:03:52]  INFO queryLog: query resolved answer=A (142.250.181.238) client_ip=127.0.0.1 client_names=127.0.0.1 duration_ms=0 question=A (youtube.com.) response_code=NOERROR response_reason=CACHED

Now you can see, that the prefetching is always triggered and the query is always cached

@kwitsch
Copy link
Collaborator

kwitsch commented Sep 26, 2021

Maybe a configuration option for the prefetching lookup timer would be an option.

Would there be an downside beside enhanced CPU time consumption if the timer would be set to one second?

@agneevX
Copy link
Contributor Author

agneevX commented Sep 26, 2021

It looks for expired entries, removes the from the cache and triggers the prefetch.

What if it's removed from cache only after a successful prefetch? There could be issues with connectivity.


I'll also say an option to serveExpired should be added that continues to serve expired entries until it's able to refresh it.

@kwitsch
Copy link
Collaborator

kwitsch commented Sep 26, 2021

I agree that the proposed solution from @agneevX would fix the symptom if this config is set.
However this won't fix the base problem of the refresh gap as it servers an expired record.
If an expired record is served but wasn't able to refresh it should be removed after some failed prefetch cycles.
Otherwise there would accumulate dead records in the cache.

@agneevX
Copy link
Contributor Author

agneevX commented Sep 26, 2021

Otherwise there would accumulate dead records in the cache.

Isn't this what the Unbound option serve-expired does? I think serving expired entries is better than retrying, which if there's no connectivity will lead to SERVFAILs with a lengthy delay.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 26, 2021

I think we should return values from cache even the entry is expired (for all or maybe only for prefetched) . In this case the ttl should be 0.

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 26, 2021

I think we should return values from cache even the entry is expired (for all or maybe only for prefetched) . In this case the ttl should be 0.

I checked the cache library -> this feature is missing (it does not return expired entries event there are some of them in the cache and the clean up job has not removed them yet).

So I see following options:

  • Set the clean up period from 15s to 1s -> I think the performance impact is very low
  • Patching/Extending the caching library in order to be able to access expired but not removed entries
  • Patching/Extending the caching library: hook function to access entries which expire soon (e.g. next 30 s)
  • Creating a new background task which periodically (e.g. every 30s) checks if there are some cache entries, which should be prefetched and are about to expire soon (in the next 60 s). These entries should be refreshed.

@kwitsch
Copy link
Collaborator

kwitsch commented Sep 27, 2021

I would prefer solution 2.
It should be the most balanced regarding CPU time and possible cache misses.

@0xERR0R 0xERR0R added this to the 0.17 milestone Sep 27, 2021
@0xERR0R 0xERR0R added the 🐞 bug Something isn't working label Sep 27, 2021
@0xERR0R 0xERR0R self-assigned this Sep 27, 2021
@agneevX
Copy link
Contributor Author

agneevX commented Sep 30, 2021

If prefetching is not used, is a stale, 0 TTL query still served?

@0xERR0R
Copy link
Owner

0xERR0R commented Sep 30, 2021

Cached answer will be served until next cleanup run (typically 0 - 15 sec). This is the trade-off of this solution. I think it is ok, if a cached query is expired, 0 TTL will be returned for a short time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants