Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support multiple hosts files #998

Merged
merged 6 commits into from
Jul 7, 2023

Conversation

ThinkChaos
Copy link
Collaborator

Fixes #867

@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Patch coverage: 93.08% and project coverage change: +0.22 🎉

Comparison is base (7c07de7) 93.55% compared to head (a23326d) 93.78%.

❗ Current head a23326d differs from pull request most recent head c09049b. Consider uploading reports for the commit c09049b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #998      +/-   ##
==========================================
+ Coverage   93.55%   93.78%   +0.22%     
==========================================
  Files          63       65       +2     
  Lines        5323     5373      +50     
==========================================
+ Hits         4980     5039      +59     
+ Misses        268      260       -8     
+ Partials       75       74       -1     
Impacted Files Coverage Δ
cmd/root.go 61.29% <0.00%> (ø)
server/server.go 79.03% <71.42%> (+0.08%) ⬆️
config/bytes_source.go 88.13% <88.13%> (ø)
lists/list_cache.go 95.65% <91.66%> (-2.89%) ⬇️
resolver/hosts_file_resolver.go 97.01% <91.66%> (-2.99%) ⬇️
lists/sourcereader.go 95.23% <95.23%> (ø)
config/config.go 78.01% <97.50%> (+1.51%) ⬆️
cmd/healthcheck.go 100.00% <100.00%> (ø)
config/blocking.go 100.00% <100.00%> (+11.36%) ⬆️
config/caching.go 87.50% <100.00%> (+37.50%) ⬆️
... and 7 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

go.mod Show resolved Hide resolved
Comment on lines +160 to +167
consumersGrp, ctx := jobgroup.WithContext(ctx)
defer consumersGrp.Close()

producersGrp := jobgroup.WithMaxConcurrency(consumersGrp, r.cfg.Loading.Concurrency)
defer producersGrp.Close()

producers := parcour.NewProducersWithBuffer[*HostsFileEntry](producersGrp, consumersGrp, producersBuffCap)
defer producers.Close()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically a JobGroup is a scope for goroutines: the defer grp.Close() ensures no goroutine continues running after the function returns. Groups can be nested: here producersGrp is a child of consumersGrp.
These are ideas from structured concurrency (highly recommend that post, it's one of the most influential CS articles I've read).

Having a clear scope for goroutines also allows for clear failure propagation:

  • when explicitly checking for goroutine errors with Wait:
    • if any goroutines returned errors, it returns them (just like errgroup.Group)
    • if any goroutines panicked, it propagates the panic to the current goroutine
  • when we leave the function without explicitly waiting for goroutine, Close handles the failures:
    • if any goroutines returned errors, it propagates them to the parent JobGroup (or panics if there is no parent)
    • if any goroutines panicked, it propagates the panic to the current goroutine

producers is an abstraction based on JobGroups to make the pattern we have here of producing host file entries from sources in parallel and consuming them from a single goroutine simple.

The fact that producers uses the JobGroups we created means we know for sure it won't leave anything running since we're closing them before returning, and we can customize how those goroutines run.


processingLinkJobs := len(links)
producersGrp := jobgroup.WithMaxConcurrency(unlimitedGrp, b.processingConcurrency)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit applies to the whitelist and blacklist independently.
With the new code it should be easy to make both share the same limit. That seems like more user friendly behavior to me since there's a single option in the config. Do you think I should make that change?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, one limit for both is good enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the current code is fine?
TBH it feels like we're not really respecting the user's choice here to me. But it's already the case so if you want to keep it as is, it's fine by me.

config/blocking.go Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
go.mod Show resolved Hide resolved

processingLinkJobs := len(links)
producersGrp := jobgroup.WithMaxConcurrency(unlimitedGrp, b.processingConcurrency)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, one limit for both is good enough

config/config.go Show resolved Hide resolved
Comment on lines -39 to 50
if err := r.parseHostsFile(context.Background()); err != nil {
r.log().Errorf("disabling hosts file resolving due to error: %s", err)
downloader: lists.NewDownloader(cfg.Loading.Downloads, bootstrap.NewHTTPTransport()),
}

r.cfg.Filepath = "" // don't try parsing the file again
} else {
go r.periodicUpdate()
err := cfg.Loading.StartPeriodicRefresh(r.loadSources, func(err error) {
r.log().WithError(err).Errorf("could not load hosts files")
})
if err != nil {
return nil, err
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hosts file resolver will now behave like the blocking resolver and have a startStrategy.
This is different from before: we were never trying again to parse the file if it failed at initialization.

I think this makes more sense now that we support remote files and multiple sources. Do you agree?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes definitively more sense. This should be documented properly to avoid confusions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also make more sense in the changelog IMO, not resolving this yet since if that's not easy to do I can add it in the docs.

config/config.go Show resolved Hide resolved
@0xERR0R 0xERR0R added this to the v0.22 milestone May 4, 2023
@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label May 4, 2023
@ThinkChaos
Copy link
Collaborator Author

Will finish this up this week-end :)

@ThinkChaos ThinkChaos force-pushed the feat/multiple-hosts-sources branch from 5bfda51 to 3486779 Compare May 7, 2023 22:14
@ThinkChaos
Copy link
Collaborator Author

Didn't have time to do the docs or run the tests locally, I should have time tomorrow for that.

I refactored how we do config options migration cause it was quite verbose and copy-pasty. This will cause conflicts in the Redis rework branch, but I think it should be relatively easy to fix.
The "x and y are both configured" configured and "please use x instead" logs are automatically handled once you define the migrations with the new DSL.
The pattern that branch introduces of returning an error from RedisConfig.validateConfig can be supported by making validation a second pass after migration.

Also moving all the options into a single Deprecated struct is nice cause it prevents old code from compiling, so you're guaranteed to catch all usage of old options.
That's how I noticed the old HTTP port option was used in cmd/root.go.

@ThinkChaos ThinkChaos force-pushed the feat/multiple-hosts-sources branch from 5d4e62b to 66c6124 Compare May 12, 2023 02:09
@ThinkChaos
Copy link
Collaborator Author

ThinkChaos commented May 12, 2023

Just pushed the docs, and cleaned up the commits.

You can see the rendered docs here.

config/config.go Outdated
Comment on lines 289 to 293
Concurrency uint `yaml:"concurrency" default:"4"`
MaxErrorsPerSource int `yaml:"maxErrorsPerSource" default:"5"`
RefreshPeriod Duration `yaml:"refreshPeriod" default:"4h"`
StartStrategy StartStrategyType `yaml:"startStrategy" default:"blocking"`
Downloads DownloaderConfig `yaml:"downloads"`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the defaults were changed here.
I don't think we currently have a way to manually write changelog text but if there's an easy way to do it it could be nice to mention this.

@0xERR0R
Copy link
Owner

0xERR0R commented Jun 19, 2023

Is there some open points or is it ready to be merged? Can you please rebase it, there is some conflict in go.mod

@ThinkChaos ThinkChaos force-pushed the feat/multiple-hosts-sources branch from a23326d to c09049b Compare June 19, 2023 23:54
@kwitsch
Copy link
Collaborator

kwitsch commented Jun 20, 2023

I like the new implementation for config depreciations. ♥️

Sadly I have currently very little time so I won't be able to test it. It would help if there was a propper Go development suit for Android. 😕

@ThinkChaos
Copy link
Collaborator Author

I'd prefer to merge this without squashing if possible, but I think the repo settings prevent that.
Would you want to change the settings to allow that, or should I just squash and merge?

@0xERR0R
Copy link
Owner

0xERR0R commented Jul 6, 2023

I'd prefer to merge this without squashing if possible, but I think the repo settings prevent that. Would you want to change the settings to allow that, or should I just squash and merge?

I think you must only rebase your branch on master. It is not required to squash commits.

Deprecated settings use pointers to allow knowing if they are actually
set in the user config.
They are also nested in a struct which ensures they aren't still used
since any old code would fail to compile, and easily make them
discoverable by `migration.Migrate`.
Replace `IsZero` with `IsAboveZero` to help us avoid this mistake again.
@ThinkChaos ThinkChaos force-pushed the feat/multiple-hosts-sources branch from c09049b to 0312237 Compare July 7, 2023 13:04
@ThinkChaos ThinkChaos merged commit 2bd5948 into 0xERR0R:main Jul 7, 2023
@ThinkChaos
Copy link
Collaborator Author

Thanks, I rebased and there was a go.sum conflict. So I guess that's why GitHub didn't want to do it.
The WebUI didn't mention that, or I missed it. Anyways, merged it :)

@ThinkChaos ThinkChaos deleted the feat/multiple-hosts-sources branch July 7, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Option to get hosts file from http url
3 participants