Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix token and uri parsers to disallow empty results #111

Merged

Conversation

acfoltzer
Copy link
Contributor

This fixes cases where the parser would accept non-compliant request lines including empty methods and paths.

For the token grammar, the spec is:

  token          = 1*tchar

1* is shorthand for one-or-more, so the empty string is not a valid token.

For the path component of the request line, the spec we're concerned with the absolute-path grammar:

  absolute-path = 1*( "/" segment )

While segment might be empty, there must be at least a "/" for it to be syntactically valid.

I've added tests for these cases and their combination, and had to update the expected error of one of the existing URI tests which now fails sooner due to the empty path.

src/lib.rs Show resolved Hide resolved
@acfoltzer acfoltzer marked this pull request as ready for review March 16, 2022 19:06
@acfoltzer
Copy link
Contributor Author

acfoltzer commented Mar 16, 2022

This appears to be performance neutral:

~/src/httparse on  acfoltzer/multiple-space-status-line-delimiters is 📦 v1.6.0 via 🦀 v1.59.0 took 13s
❯ cargo +nightly bench --bench parse
   Compiling httparse v1.6.0 (/home/acfoltzer/src/httparse)
    Finished bench [optimized] target(s) in 7.24s
     Running benches/parse.rs (target/release/deps/parse-80e05693670d5e82)

running 4 tests
test bench_httparse       ... bench:         240 ns/iter (+/- 18) = 3000 MB/s
test bench_httparse_short ... bench:          47 ns/iter (+/- 0) = 1446 MB/s
test bench_pico           ... bench:         342 ns/iter (+/- 5) = 2105 MB/s
test bench_pico_short     ... bench:          42 ns/iter (+/- 0) = 1619 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 16.41s


~/src/httparse on  master is 📦 v1.6.0 via 🦀 v1.59.0 took 23s
❯ cargo +nightly bench --bench parse
   Compiling httparse v1.6.0 (/home/acfoltzer/src/httparse)
    Finished bench [optimized] target(s) in 7.18s
     Running benches/parse.rs (target/release/deps/parse-80e05693670d5e82)

running 4 tests
test bench_httparse       ... bench:         239 ns/iter (+/- 10) = 3012 MB/s
test bench_httparse_short ... bench:          48 ns/iter (+/- 0) = 1416 MB/s
test bench_pico           ... bench:         326 ns/iter (+/- 3) = 2208 MB/s
test bench_pico_short     ... bench:          40 ns/iter (+/- 0) = 1700 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out; finished in 13.29s

(tested with Ubuntu 20.04 on a 18C/36T Xeon)

@acfoltzer
Copy link
Contributor Author

Criterion shows this to be within the noise after much more sampling, so I feel increasingly confident this is a performance-neutral fix

req/req                 time:   [331.62 ns 333.11 ns 334.56 ns]
                        thrpt:  [2.0043 GiB/s 2.0130 GiB/s 2.0221 GiB/s]
                 change:
                        time:   [-0.5625% +0.4717% +1.2605%] (p = 0.36 > 0.05)
                        thrpt:  [-1.2448% -0.4694% +0.5656%]
                        No change in performance detected.

req_short/req_short     time:   [68.329 ns 68.730 ns 69.152 ns]
                        thrpt:  [937.79 MiB/s 943.54 MiB/s 949.09 MiB/s]
                 change:
                        time:   [-1.6173% -1.1715% -0.7101%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7151% +1.1853% +1.6439%]
                        Change within noise threshold.

resp/resp               time:   [336.06 ns 337.24 ns 338.61 ns]
                        thrpt:  [1.9226 GiB/s 1.9304 GiB/s 1.9371 GiB/s]
                 change:
                        time:   [-0.7660% -0.1202% +0.5012%] (p = 0.72 > 0.05)
                        thrpt:  [-0.4987% +0.1204% +0.7719%]
                        No change in performance detected.

resp_short/resp_short   time:   [73.375 ns 73.788 ns 74.284 ns]
                        thrpt:  [1.1409 GiB/s 1.1486 GiB/s 1.1550 GiB/s]
                 change:
                        time:   [-1.7031% -1.0090% -0.4162%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4180% +1.0193% +1.7326%]
                        Change within noise threshold.

This fixes cases where the parser would accept non-compliant request lines including empty methods
and paths.

For the `token` grammar, [the spec][spec-token] is:

```
  token          = 1*tchar
```

`1*` is shorthand for one-or-more, so the empty string is not a valid `token`.

For the path component of the request line, [the spec][spec-path] we're concerned with the
`absolute-path` grammar:

```
  absolute-path = 1*( "/" segment )
```

While `segment` might be empty, there must be at least a `"/"` for it to be syntactically valid.

I've added tests for these cases and their combination, and had to update the expected error of one
of the existing URI tests which now fails sooner due to the empty path.

[spec-token]: https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#tokens
[spec-path]: https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#uri.references
@acfoltzer acfoltzer force-pushed the acfoltzer/empty-methods-paths branch from 615f4be to 442e2e2 Compare March 18, 2022 17:12
Copy link
Owner

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean, thanks for the bench results!

@seanmonstar seanmonstar merged commit 5654959 into seanmonstar:master Mar 18, 2022
@acfoltzer acfoltzer deleted the acfoltzer/empty-methods-paths branch March 18, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants