Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A memory-efficient and fast hybrid matcher #639

Merged
merged 11 commits into from
Mar 3, 2021

Conversation

darsvador
Copy link
Contributor

@darsvador darsvador commented Jan 27, 2021

#587 implemented a fast domain matcher based on AC automata. It is much faster than the original matcher, but it takes up a lot of memory when patterns increase. I reimplemented a hybrid matcher, which is 10% slower than the pure AC automata method. But it is still 25% faster than the original matcher and more memory-efficient than the original matcher and pure AC automata. The figure below is a comparison of hybrid matcher and original matcher implementation benchmarks.

hybridmatcher_speed
ac_speed
origin_matcher_speed_1
domain_matcher_memory_bench

@darsvador darsvador marked this pull request as draft January 28, 2021 06:28
@LazyZhu
Copy link

LazyZhu commented Jan 29, 2021

现在要讨论的是只根据 . 分节点是否还适合, 因为对于域名这种数据有着特殊性, 根据最新数据 com 独占 51% 剩下的都在 5% 左右, 这种差距在域名排名中越靠前越大, 有兴趣可以找 top-1m domains 数据研究下.
在这种情况下只根据 . 分节点来提高查找效率效果非常有限. 解决方式就是除去 .tld 后采用按字符分节点, 最合适的是 radix-tree

          com
           /  \
          h    world
         / \
        el  y
        / \  \
       l   x  xxx
      / \   \
     o   x   o

@darsvador
Copy link
Contributor Author

现在要讨论的是只根据 . 分节点是否还适合, 因为对于域名这种数据有着特殊性, 根据最新数据 com 独占 51% 剩下的都在 5% 左右, 这种差距在域名排名中越靠前越大, 有兴趣可以找 top-1m domains 数据研究下.
在这种情况下只根据 . 分节点来提高查找效率效果非常有限. 解决方式就是除去 .tld 后采用按字符分节点, 最合适的是 radix-tree

          com
           /  \
          h    world
         / \
        el  y
        / \  \
       l   x  xxx
      / \   \
     o   x   o
  1. 这个pr的实现实际上并非按照.分隔的,而是统一将domainfull俩种规则都放入hashmap之中,特别地,substring规则则交由ac自动机处理.
  2. 也许你可以把你的想法实现,benchmark一下.

@darsvador darsvador marked this pull request as ready for review March 3, 2021 08:30
@kslr kslr merged commit a31a8e6 into v2fly:master Mar 3, 2021
@kslr
Copy link
Contributor

kslr commented Mar 3, 2021

Thanks for your work.

Loyalsoldier added a commit that referenced this pull request Mar 16, 2021
* update geoip, geosite

* Chore: bump google.golang.org/grpc from 1.35.0 to 1.36.0 (#711)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Chore: bump github.com/miekg/dns from 1.1.39 to 1.1.40 (#712)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add /opt to assets location (#715)

* Add definition for transport layer chained proxy

* Regenerate protobuf for transport layer chain proxy

* Added Transport Layer Chained Proxy Support

* Fix dependency cycle caused by core import in internet package

* Fix forced outbound tag not set correctly

* Disable routing for platform initialized detour

* Added Auto generated file

* don't build tagged outbound dial on configure setting

* Fix for context with empty content

* Fix ALPN being set to h2 by default when using TCP (#716)

* Deprecate legacy VMess header with a planned decommission (#717)

* Zero Security imaginary security level

* Regenerate protobuf for Zero Security imaginary security level

* Imaginary Security Lever: zero: turn off all security on payload data

* Test for Imaginary Security Level: zero

* Fix panic: index out of range (#727)

* Chore: update dependencies & protobuf (#728)

* A memory-efficient and fast hybrid matcher (#639)

* a faster DomainMatcher implementation

* rename benchmark name

* fix linting errors

* add hybrid matcher

* add rabin-karp algorithm

* rename test & fix linting errors

* add more comment

* format code

* revert `MatcherGroup` match func

* fix linting errors

* Allow the selection of domain matcher

* Apply domain selector choice

* json parsing rule for domain matcher

* output debug message when ACAutomatonDomainMatcher is enabled.

* update version

* update geoip, geosite

* Chore: bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (#732)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* workaround crash when V is not in context

* rename config for NewACAutomatonDomainMatcher to hybrid

* Allow bulk definition of domain matcher at parent level

* fix misbehaving code crash and create bug on transport level front proxy

* fixing misbehaving code in mux that do not propagate context

* create session content in the context if do not exist yet

* Create a name for linear domain matcher

* update version to 4.35.1

* Chore: update protobuf & dependencies (#748)

* Chore: bump actions/stale from v3.0.17 to v3.0.18 (#752)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* DNS: refine Android bootstrap DNS logic (#767)

* Chore: bump github.com/pires/go-proxyproto from 0.4.2 to 0.5.0 (#751)

Bumps [github.com/pires/go-proxyproto](https://github.com/pires/go-proxyproto) from 0.4.2 to 0.5.0.
- [Release notes](https://github.com/pires/go-proxyproto/releases)
- [Commits](pires/go-proxyproto@v0.4.2...v0.5.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Grpc Gun Transport (#757)

* introduce grpc transport structure

* fix package name inconsistency

* grpc gun transport dialer and listener

* add selective build tag

* add grpc:gun listener

* add grpc:gun config

* add generated files

* various bug fix for gun:grpc transport

* Cache dialed connections

* grpc:gun Use V2Ray Managed Dial function

* Update destination.pb.go

* Update gun.go

* GunSettings -> GunConfig

* gu -> gs

* add grpc alias

Co-authored-by: RPRX <63339210+rprx@users.noreply.github.com>
Co-authored-by: kslr <kslrwang@gmail.com>

* fix applied wrong name, and wrong varible name

* Add grpcSettings (alias of gunSettings)

* update geoip, geosite

* loopback outbound, allow you to redirect connection to the dispatcher again (#770)

* Added Loop back proxy

* Added json processing for lo proxy

* Fix bug for lo proxy

* Fix bug for lo proxy

* rename the outbound name

* Loopback: update naming and fix lint issues

* Chore: change lo to loopback

Co-authored-by: kslr <kslrwang@gmail.com>
Co-authored-by: loyalsoldier <10487845+Loyalsoldier@users.noreply.github.com>

* update version

* Chore: format import using goimports (#780)

* Chore: fix lint according to golangci-lint errors (#781)

* Chore: fix lint according to golangci-lint errors
* Chore: regenerate pb.go files

* Add minimal perfect hash domain matcher (#743)

* rename to HybridDomainMatcher & convert domain to lowercase

* refactor code & add open hashing for rolling hash map

* fix lint errors

* update app/dns/dns.go

* convert domain to lowercase in `strmatcher.go`

* keep the original matcher behavior

* add mph domain matcher & conver domain names to loweercase when matching

* fix lint errors

* fix lint errors

* Route: mph add alias hybrid

* FakeDNS: use 198.18.0.0/15 as default IP pool (#779)

* Add remote address to grpc transport layer conn (#783)

* Add remote address to grpc transport layer conn

* go fmt

* Revert "Test: fix http2 dial timeout (#570)" (#778)

* Revert "Test: fix http2 dial timeout (#570)"

This reverts commit 405a051.

* Feat: lower the payload size

* Remove state.NegotiatedProtocolIsMutual

It has been deprecated since Go 1.16 because it shouldn't be used: this value is always true.

* Chore: format code

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kid <44045911+kidonng@users.noreply.github.com>
Co-authored-by: Shelikhoo <xiaokangwang@outlook.com>
Co-authored-by: 秋のかえで <autmaple@protonmail.com>
Co-authored-by: DarthVader <61409963+darsvador@users.noreply.github.com>
Co-authored-by: CalmLong <37164399+CalmLong@users.noreply.github.com>
Co-authored-by: RPRX <63339210+rprx@users.noreply.github.com>
Co-authored-by: kslr <kslrwang@gmail.com>
Co-authored-by: maskedeken <52683904+maskedeken@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants