Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use byte regexes to fix compiltation error #629

Merged
merged 2 commits into from
Sep 18, 2021

Conversation

Y0ba
Copy link
Contributor

@Y0ba Y0ba commented Sep 18, 2021

Fixes error error: pattern can match invalid UTF-8 in regular expressions. It still rejects unicode characters with another error error: Unicode not allowed here but that's fine since there is no unicode characters in dns records anyway.

@zonyitoo
Copy link
Collaborator

I didn't notice that Regex has a byte only version. Great work!

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

Should I add ascii check for |asd.xyz and ||asd.xyz patterns? Currently they accept international domains.

@zonyitoo
Copy link
Collaborator

Good point. I think you should add it for consistency, the whole ACL file should have exactly one syntax constraint.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Sep 18, 2021

Just check every bytes of each lines if they are in 0..128.

u8 has a method is_ascii.

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

String types have is_ascii too: https://doc.rust-lang.org/std/primitive.str.html#method.is_ascii

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

Done. I also factored out some code into new struct ParsingRules to reduce code duplication and size of load_from_file function.

@zonyitoo zonyitoo merged commit 731e613 into shadowsocks:master Sep 18, 2021
@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

@zonyitoo I also wonder about case sensitivity. Does ss' internal dns server lowercase dns requests? Maybe add lowercasing for |asd.xyz and ||asd.xyz rules too? It's also possible to lowercase regex-patterns and disable case insensitivity in regex engine to improve performance event further.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Sep 18, 2021

Well, for socks5 and http proxy protocol, they didn't require domain names to be lowercase.

On the server side, the DNS client is using trust-dns-resolver, which I think will translate names to lowercase when sending queries (not sure).

Maybe add lowercasing for |asd.xyz and ||asd.xyz rules too? It's also possible to lowercase regex-patterns and disable case insensitivity in regex engine to improve performance event further.

Sounds good.

@zonyitoo
Copy link
Collaborator

Oh BTW, since ACLs are now limited to ASCII, should we convert IDNs to Punycode before checking if it matches any rules?

@dev4u
Copy link

dev4u commented Sep 18, 2021

多谢小伙伴们快速发布了代码,修复了缺陷。

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

I can convert |... and ||.... But with regexes it's complicated (if not impossible), for example rule is xn--4gq while 一一 is xn--4gqa, thus 一+ cannot be encoded in punycode in any meaningful way.

@zonyitoo
Copy link
Collaborator

Should we convert domains to lowercase ASCII when handling check_target_bypassed?

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

DNS server always operates with ASCII. There is no need to convert anything. And if trust-dns-resolver lowercases incoming dns requests there is no need to lowercase them too. But I wonder is there a way to add auto test for that?

@Y0ba Y0ba deleted the fix-regex branch September 18, 2021 17:47
@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

So, ss' DNS server already explicitly converts names to ascii. I can add here conversion to lowercase just in case.

fn check_name_in_proxy_list(acl: &AccessControl, name: &Name) -> Option<bool> {
if name.is_fqdn() {
// remove the last dot from FQDN
let mut name = name.to_ascii();
name.pop();
acl.check_host_in_proxy_list(&name)
} else {
// unconditionally use default for PQDNs
Some(acl.is_default_in_proxy_list())
}
}

@Y0ba
Copy link
Contributor Author

Y0ba commented Sep 18, 2021

Should we convert domains to lowercase ASCII when handling check_target_bypassed?

Ooh. Now I see what did you mean. There are two places with acl checks. First is check_name_in_proxy_list and second is check_target_bypassed. check_target_bypassed isn't related to dns and gets hosts directly from connection queries. Yea, there should be both punycode and lowercase conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants