-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are (erroneous) non-ASCII ALPN strings handled? #16
Comments
This is a great question, thanks for bringing this up! For example purposes, let's say the first ALPN value is 0xAB 0xCD. To handle this edge case we could simply place a "99" in the JA4 string to denote an unknown non-ASCII ALPN value. Or, for non-ASCII ALPN values, we could take the first high-nibble (A) and the last low-nibble (D). So the ALPN value in the JA4 string would be "ad". I like the latter option. What do you think? |
Either way is fine with me. If you like the latter option that works. |
As suggested in FoxIO-LLC/ja4#16 use first high-nibble and the last low-nibble for non printable ALPN values. Fixes: 19401
As suggested in FoxIO-LLC/ja4#16 use first high-nibble and the last low-nibble for non printable ALPN values. Fixes: 19401 (cherry picked from commit 48cd7f9)
Sample capture file: |
@john-althouse The nibble approach sees no difference between |
* rust: Support tshark v4.2.0 * rust: Handle non-ASCII ALPN strings Related issue: #16
This should be fixed now with recent changes to Rust and Python. |
ALPN Extension Value:
"The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value... If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint."
The ALPN first value is technically not a string. It is an "IANA-registered, opaque, non-empty byte string". As described further, it is the "precise set of octet values that identifies the protocol. This could be the UTF-8 encoding of the protocol name."
Currently all registered values, aside from some reserved values, are ASCII. However, if in fuzzed or erroneous data we get non ASCII, how should that be displayed?
If we treat it as "probably UTF-8" based on the recommendation, does "first and last characters" refer to UTF-8 characters, possibly multibyte? Or does it refer to octets?
If a handshake is encountered with first and last octets of the ProtocolName opaque byte string not printable ASCII characters (whether because someone registers a UTF-8 Identification Sequence or because of errors in the capture, how should it be handled? Should the non-printable ASCII bytes be escaped, changing the length of the JA4 string? Should they be replaced with single characters like '?' or with a (multibyte) UTF-8 REPLACEMENT CHARACTER? Should the ALPN portion of the JA4 string be replaced with "00" as in the cases where it is missing?
The text was updated successfully, but these errors were encountered: