Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Use Redis as primary database #898

Closed
shizunge opened this issue Feb 26, 2023 · 15 comments
Closed

[Feature] Use Redis as primary database #898

shizunge opened this issue Feb 26, 2023 · 15 comments
Labels
🔨 enhancement New feature or request Stale

Comments

@shizunge
Copy link
Contributor

Currently Blocky supports "blocking" and "fast" start strategy. (And there is a bug of "fast" strategy in 0.20 fixed in #804 )

Both result in a downtime if blocky re-starts and there is only a single blocky instance.

  • For blocking, no DNS is available.
  • For fast, DNS is available, but not blocking.

Does it make sense that we save the processed blocking list to Redis and load them at blocky starting (when Redis is configured). I assume loading the processed blocking list from Redis is much faster than download and process them from internet. This helps reduce the down time between blocky restarts.

@kwitsch
Copy link
Collaborator

kwitsch commented Feb 26, 2023

Thought about it as well. 👍
Would also speedup processing lists in a multi instance environment.

@kwitsch kwitsch added the 🔨 enhancement New feature or request label Feb 26, 2023
@kwitsch kwitsch changed the title [Feature] Minimize down time between restarts [Feature] Store/restore block lists in Redis Feb 26, 2023
@0xERR0R
Copy link
Owner

0xERR0R commented Feb 27, 2023

Using Redis as storage would definitively solve some problems.

I have the following concerns/thoughts/ideas:

Currently all blocky instances are independent. Each instance loads the lists from the network and keeps them in an optimised cache. This cache supports TTL. This cache is also used for the search (lookup). If we use Redis to store the lists, then we need the memory twice: Blocky holds the list in memory (each instance) and Redis.

Just fantasising further: we could remove Blocky's cache altogether and just use Redis as the list cache. Redis already supports TTL and search. I.e. when a request comes in, blocky makes a Redis request and checks if the name is on the list. Similarly, we can also do the response cache and prefetching completely in Redis. Of course, there would be some latency with each request, but in a gigabit network this should be negligible. For this, we would not need a cache implementation in Blocky, no more sync between the instances (Redis pub/sub), no more duplicate data storage. We could also keep more things in Redis later, e.g. statistics or configuration.

To roughly sketch it out: Download part periodically downloads the list and stores it directly in Redis. For each request it checks (in Redis) if the result is already cached. If not, check against blacklist (in Redis), lookup via upstream, store result in Redis.

I think Redis is a solid and mature system and it offers many features that are well suited for blocky and we should think about whether it might be a viable way forward. Of course, we also have features that can't be mapped with Redis (e.g. regex search), so that needs to be looked at in more detail.

What do you think about it?

@kwitsch
Copy link
Collaborator

kwitsch commented Feb 27, 2023

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

I would be in for it since it would solve some problems(as you already said).

Things to consider:

  • backwards compatibility for setups without Redis
  • Redis cluster support
  • speedtest to evaluate response times of Redis vs blocky internal cache
  • way to migrate regex mappings to Redis query syntax

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 27, 2023

@0xERR0R just to be sure: You propose to remove the internal blocky caches in favor of external Redis ones?

Yes, it is just an idea. Blocky uses currently some 3rd party systems (postgres, prometheus, redis, ...), why not to use the power of redis and remove "duplicate" caching code from blocky? I think, blocky should do one "thing" really good and for all other things we can use other tools.

I would be in for it since it would solve some problems(as you already said).

Things to consider:

  • backwards compatibility for setups without Redis
  • Redis cluster support
  • speedtest to evaluate response times of Redis vs blocky internal cache
  • way to migrate regex mappings to Redis query syntax

Of course, this is a complex topic and it is something for blocky v1.0. We should make POC (proof of concept) to check the feasibility and evaluate the performance. And if it is ok, we should check, how can we migrate all features to Redis and maybe use new features (for example we can put the config in redis, this will allow the check the config on-the-fly).

If we make redis mandatory, this will also lead in a more complex installation (for non dockerized platforms, like edge devices), but I think it is ok if we can provide more features.

@kwitsch
Copy link
Collaborator

kwitsch commented Feb 27, 2023

Agreed 👍

I'll try to make a POC for speed testing the query cache.
If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

@kwitsch kwitsch changed the title [Feature] Store/restore block lists in Redis [Feature] Use Redis as primary database Feb 27, 2023
@kwitsch kwitsch self-assigned this Feb 27, 2023
@0xERR0R
Copy link
Owner

0xERR0R commented Feb 27, 2023

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

@kwitsch
Copy link
Collaborator

kwitsch commented Feb 27, 2023

I'll try to make a POC for speed testing the query cache. If the results won't be satisfying we should reconsider this idea as everything else is less critical regarding speed.

Do you mean the query result cache (cachingResolver)? I can try to test the speed for blocking resolver (put some blacklists into redis and change the lookup logic)

Yes, in my opinion the query result cache would be most time critical regarding the overall speed.

@shizunge
Copy link
Contributor Author

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

@kwitsch
Copy link
Collaborator

kwitsch commented Feb 27, 2023

We need a fast object storage and not an SQL database.
There is a significant difference between them.

Besides SQLite is a really bad idea if used in a multi threaded environment.

@ThinkChaos
Copy link
Collaborator

ThinkChaos commented Feb 27, 2023

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis.
Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 28, 2023

We are signficant extending the scope of the bug I original want to solve.

Making blocky instances are independent sounds attractive, even though I am not currently use it on an edge device. I don't mind setup the extra components, yet someone else may do so.

On the other hand, if blokcy must depend on some external services, why it must be redis? Why not sqlite 3 (no need external service) or postgresql?

I would say your issue was just a trigger for a new discussion. I have been thinking about this for a long time and we have a lot of features where redis would fit well: We can use redis as cache, as list storage with fast search and as event bus (we do it already partially but only as fallback in background). And we could implement new features (e.g. adding new blacklist entry on-the-fly).

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 28, 2023

I don't have Redis at all in my setup and would like to avoid making it mandatory if possible. I do like the idea of removing some caching logic from blocky.

This module seems like it would be a good fit so we can outsource the cache to Redis, with an in memory fallback already implemented: https://pkg.go.dev/github.com/go-redis/cache/v8

EDIT: For regex rules, that module doesn't have anything, but neither does Redis. Maybe only supporting wildcards ( *.domain.tld) would be enough to cover most use cases. Redis supports that, and we could make a PR to the module.

I think, the first step should be the evaluation. Performance, overall speed etc. We should also think about, how we can implement existing features with redis (e.g. client lookup, config etc). We should also think about usage without redis (if it can be possible or maybe it is possible but without some features). redis-cache looks interesting. I found also https://github.com/ledisdb/ledisdb, if it is 100% compatible, maybe we can use it as embedded database for users who don't have redis.

@0xERR0R
Copy link
Owner

0xERR0R commented Feb 28, 2023

We can discuss redis topic here: #905

@kwitsch kwitsch removed their assignment Mar 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 6, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

4 participants