[Feature] Use Redis as primary database #898

shizunge · 2023-02-26T05:12:41Z

Currently Blocky supports "blocking" and "fast" start strategy. (And there is a bug of "fast" strategy in 0.20 fixed in #804 )

Both result in a downtime if blocky re-starts and there is only a single blocky instance.

For blocking, no DNS is available.
For fast, DNS is available, but not blocking.

Does it make sense that we save the processed blocking list to Redis and load them at blocky starting (when Redis is configured). I assume loading the processed blocking list from Redis is much faster than download and process them from internet. This helps reduce the down time between blocky restarts.

kwitsch · 2023-02-26T15:24:16Z

Thought about it as well. 👍
Would also speedup processing lists in a multi instance environment.

0xERR0R · 2023-02-27T07:04:07Z

Using Redis as storage would definitively solve some problems.

I have the following concerns/thoughts/ideas:

Currently all blocky instances are independent. Each instance loads the lists from the network and keeps them in an optimised cache. This cache supports TTL. This cache is also used for the search (lookup). If we use Redis to store the lists, then we need the memory twice: Blocky holds the list in memory (each instance) and Redis.

Just fantasising further: we could remove Blocky's cache altogether and just use Redis as the list cache. Redis already supports TTL and search. I.e. when a request comes in, blocky makes a Redis request and checks if the name is on the list. Similarly, we can also do the response cache and prefetching completely in Redis. Of course, there would be some latency with each request, but in a gigabit network this should be negligible. For this, we would not need a cache implementation in Blocky, no more sync between the instances (Redis pub/sub), no more duplicate data storage. We could also keep more things in Redis later, e.g. statistics or configuration.

To roughly sketch it out: Download part periodically downloads the list and stores it directly in Redis. For each request it checks (in Redis) if the result is already cached. If not, check against blacklist (in Redis), lookup via upstream, store result in Redis.

I think Redis is a solid and mature system and it offers many features that are well suited for blocky and we should think about whether it might be a viable way forward. Of course, we also have features that can't be mapped with Redis (e.g. regex search), so that needs to be looked at in more detail.

What do you think about it?

kwitsch · 2023-02-27T08:54:54Z

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

I would be in for it since it would solve some problems(as you already said).

Things to consider:

backwards compatibility for setups without Redis
Redis cluster support
speedtest to evaluate response times of Redis vs blocky internal cache
way to migrate regex mappings to Redis query syntax

0xERR0R · 2023-02-27T09:18:57Z

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

Yes, it is just an idea. Blocky uses currently some 3rd party systems (postgres, prometheus, redis, ...), why not to use the power of redis and remove "duplicate" caching code from blocky? I think, blocky should do one "thing" really good and for all other things we can use other tools.

I would be in for it since it would solve some problems(as you already said).

Things to consider:

backwards compatibility for setups without Redis

Redis cluster support

speedtest to evaluate response times of Redis vs blocky internal cache

way to migrate regex mappings to Redis query syntax

Of course, this is a complex topic and it is something for blocky v1.0. We should make POC (proof of concept) to check the feasibility and evaluate the performance. And if it is ok, we should check, how can we migrate all features to Redis and maybe use new features (for example we can put the config in redis, this will allow the check the config on-the-fly).

If we make redis mandatory, this will also lead in a more complex installation (for non dockerized platforms, like edge devices), but I think it is ok if we can provide more features.

kwitsch · 2023-02-27T10:00:31Z

Agreed 👍

I'll try to make a POC for speed testing the query cache.
If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

0xERR0R · 2023-02-27T10:05:30Z

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

kwitsch · 2023-02-27T10:07:52Z

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

Yes, in my opinion the query result cache would be most time critical regarding the overall speed.

shizunge · 2023-02-27T19:32:34Z

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

kwitsch · 2023-02-27T21:39:17Z

We need a fast object storage and not an SQL database.
There is a significant difference between them.

Besides SQLite is a really bad idea if used in a multi threaded environment.

ThinkChaos · 2023-02-27T23:15:38Z

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis.
Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

0xERR0R · 2023-02-28T06:27:54Z

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

I would say your issue was just a trigger for a new discussion. I have been thinking about this for a long time and we have a lot of features where redis would fit well: We can use redis as cache, as list storage with fast search and as event bus (we do it already partially but only as fallback in background). And we could implement new features (e.g. adding new blacklist entry on-the-fly).

0xERR0R · 2023-02-28T06:31:59Z

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis. Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

I think, the first step should be the evaluation. Performance, overall speed etc. We should also think about, how we can implement existing features with redis (e.g. client lookup, config etc). We should also think about usage without redis (if it can be possible or maybe it is possible but without some features). redis-cache looks interesting. I found also https://github.com/ledisdb/ledisdb, if it is 100% compatible, maybe we can use it as embedded database for users who don't have redis.

0xERR0R · 2023-02-28T06:34:36Z

We can discuss redis topic here: #905

github-actions · 2023-06-06T04:06:53Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2023-06-11T04:06:55Z

This issue was closed because it has been stalled for 5 days with no activity.

kwitsch added the 🔨 enhancement New feature or request label Feb 26, 2023

kwitsch changed the title ~~[Feature] Minimize down time between restarts~~ [Feature] Store/restore block lists in Redis Feb 26, 2023

kwitsch changed the title ~~[Feature] Store/restore block lists in Redis~~ [Feature] Use Redis as primary database Feb 27, 2023

kwitsch self-assigned this Feb 27, 2023

kwitsch removed their assignment Mar 7, 2023

github-actions bot added the Stale label Jun 6, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Use Redis as primary database #898

[Feature] Use Redis as primary database #898

shizunge commented Feb 26, 2023

kwitsch commented Feb 26, 2023

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023 •

edited

Loading

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023

shizunge commented Feb 27, 2023

kwitsch commented Feb 27, 2023

ThinkChaos commented Feb 27, 2023 •

edited

Loading

0xERR0R commented Feb 28, 2023

0xERR0R commented Feb 28, 2023

0xERR0R commented Feb 28, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 11, 2023

[Feature] Use Redis as primary database #898

[Feature] Use Redis as primary database #898

Comments

shizunge commented Feb 26, 2023

kwitsch commented Feb 26, 2023

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023 • edited Loading

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023

0xERR0R commented Feb 27, 2023

kwitsch commented Feb 27, 2023

shizunge commented Feb 27, 2023

kwitsch commented Feb 27, 2023

ThinkChaos commented Feb 27, 2023 • edited Loading

0xERR0R commented Feb 28, 2023

0xERR0R commented Feb 28, 2023

0xERR0R commented Feb 28, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 11, 2023

kwitsch commented Feb 27, 2023 •

edited

Loading

ThinkChaos commented Feb 27, 2023 •

edited

Loading