speed up parsing for short ipv4s in std::net::Ipv4Addr::from_str #97118

nkconnor · 2022-05-17T16:29:14Z

This aims to improve parsing speed for short and empty strings.

There is a proof of concept based on Criterion here: https://github.com/nkconnor/rust-ipv4-optimization/blob/main/benches/ipv4_fromstr_benchmark.rs

On my machine it offers about a 15X speed-up in the happy case (len() < 7). The parser today takes a more similar duration for "1.1.1" as it does for "1.1.1.1".

Optimizing for this case is especially helpful for situations where input is a non-null string type. For example, a CSV library may offer non-null string type for an IP Address column. IP Address is a very common element in "big data" and this optimization is likely to be appreciated in such contexts.

Criterion Output:

Main Parse OK time: [21.145 ns 21.181 ns 21.222 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

Main Parse <7 time: [15.268 ns 15.286 ns 15.304 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

Patch Parse OK time: [21.309 ns 21.506 ns 21.843 ns]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe

Patch Parse <7 time: [998.09 ps 999.02 ps 1.0001 ns]

rust-highfive · 2022-05-17T16:29:17Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

rust-highfive · 2022-05-17T16:29:18Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

Please see the contribution instructions for more information.

nkconnor · 2022-05-18T22:18:05Z

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option<Ipv4Addr>. That will short circuit for the IpAddr::from_str case as well.

JohnCSimon · 2022-07-24T05:23:58Z

ping from triage:
@nkconnor - looks like this PR isn't ready for review

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@rustbot author

nkconnor · 2022-07-24T12:08:28Z

ping from triage:

@nkconnor - looks like this PR isn't ready for review

FYI: when a PR is ready for review, send a message containing

@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@rustbot author

@JohnCSimon why isn't it ready for review?

JohnCSimon · 2022-07-24T17:21:28Z

From your comment it looked like you still had more to add to this PR - my mistake, then.
@rustbot review

nkconnor · 2022-07-25T14:46:53Z

makes sense, I was hoping to get feedback on the comment prior to moving the checks. I didn't see a reason for this check to be in the place that it is.

nkconnor · 2022-08-09T16:25:39Z

r? libs

Mark-Simulacrum · 2022-08-09T17:17:32Z

Can you say more about "the happy case" here? It looks like too short IPs previously went through a longer attempt to parse, but my (maybe wrong) assumption is that such IPs are fairly rare in any data set of IPs.

Are we expecting users to attempt parsing and then fallback on some alternative?

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option. That will short circuit for the IpAddr::from_str case as well.

Presuming this is the same behavior-wise (probably the case, but would need to check call sites), that seems reasonable to me.

nkconnor · 2022-08-09T18:28:37Z

Can you say more about "the happy case" here? It looks like too short IPs previously went through a longer attempt to parse, but my (maybe wrong) assumption is that such IPs are fairly rare in any data set of IPs.

That's right, the happy case is where a too short IP is parsed and the parser exits early.

I think IPs with 0 < length < 7 are probably more rare than IPs with length == 0. I was motivated to look at this because we were processing a very large dataset and the question came up whether to do a length check before attempting parsing. Since the parser was slow for the too short IPs, we added the length check. Our dataset had cases of length==0 and 0 < length < 7.

If we kept the assumption that such IPs are fairly rare, could we say that the patch is still worthwhile? The criterion output shows a small but supposably measurable performance decline for parsing valid IPs.

Are we expecting users to attempt parsing and then fallback on some alternative?

I would expect this to happen sometimes. In my case, we were provided with a third-party string which may/may not be a valid IP and then writing it in a structured format to a storage archive.

Somewhat related, I took a peek at how url parses a Url. They don't use std's IP parsers. It's a little convoluted but they do attempt parsing and then fallback to a domain name: https://docs.rs/url/latest/src/url/host.rs.html#77-119.

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option. That will short circuit for the IpAddr::from_str case as well.

Presuming this is the same behavior-wise (probably the case, but would need to check call sites), that seems reasonable to me.

I believe it is. This read_ipv4_addr is also used within parsing for IPv6s and SocketAddrs.

Mark-Simulacrum · 2022-08-09T18:30:39Z

OK, this PR seems fine then -- let's update it to adjust read_ipv4_addr then.

rustbot · 2022-08-09T18:58:55Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

nkconnor · 2022-08-09T19:29:59Z

some tests appear to be failing, looking into it. Also, tested the case of an empty string and the stats are different than above (which tested "1.1.1" .. 0<length<7)

current       time:   [4.6269 ns 4.6319 ns 4.6373 ns]
patch         time:   [3.4966 ns 3.4984 ns 3.5004 ns]

nkconnor · 2022-08-09T19:39:22Z

I think tests fail with SocketAddrs because the Parser state includes the port info, and the max length check (<=15) isn't valid when looking at that too. I would have to propagate AddrKind down to the parsing function to determine the max length, or leave the max length check where it is.

nkconnor · 2022-08-09T20:19:43Z

this could/should fail too (ipv6s can be shorter, I think), but it's one example of how we could be consistent in doing length checks across the net::parser::AddrKinds. Not sure which direction to go.. I will come back to this unless you have suggestion.

@rustbot author

rust-log-analyzer · 2022-08-09T21:20:56Z

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
thread '<unnamed>' panicked at 'static string', library/std/src/thread/tests.rs:190:9
.....................................................
failures:

---- net::ip::tests::ip_properties stdout ----
thread 'net::ip::tests::ip_properties' panicked at 'called `Result::unwrap()` on an `Err` value: AddrParseError(Ip)', library/std/src/net/ip/tests.rs:315:5

failures:
    net::ip::tests::ip_properties


test result: FAILED. 931 passed; 1 failed; 1 ignored; 0 measured; 0 filtered out; finished in 10.32s

error: test failed, to rerun pass '-p std --lib'
Build completed unsuccessfully in 0:01:38

bors · 2022-08-29T08:12:03Z

☔ The latest upstream changes (presumably #101143) made this pull request unmergeable. Please resolve the merge conflicts.

JohnCSimon · 2022-11-27T04:28:03Z

@nkconnor
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you do please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

add optimized parsing rule for known bad ipv4

9b02d8b

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 17, 2022

rust-highfive assigned kennytm May 17, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 17, 2022

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 24, 2022

rust-highfive assigned Mark-Simulacrum and unassigned kennytm Aug 9, 2022

apply ipv4 parser length checks to more situations

4783dc5

This comment has been minimized.

Sign in to view

fix parser length checks to depend on the type of addr being parsed

2aab9af

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 9, 2022

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 2, 2022

JohnCSimon closed this Nov 27, 2022

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Nov 27, 2022

speed up parsing for short ipv4s in std::net::Ipv4Addr::from_str #97118

speed up parsing for short ipv4s in std::net::Ipv4Addr::from_str #97118

Uh oh!

Conversation

nkconnor commented May 17, 2022

Uh oh!

rust-highfive commented May 17, 2022

Uh oh!

rust-highfive commented May 17, 2022

Uh oh!

nkconnor commented May 18, 2022

Uh oh!

JohnCSimon commented Jul 24, 2022

Uh oh!

nkconnor commented Jul 24, 2022

Uh oh!

JohnCSimon commented Jul 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkconnor commented Jul 25, 2022

Uh oh!

nkconnor commented Aug 9, 2022

Uh oh!

Mark-Simulacrum commented Aug 9, 2022

Uh oh!

nkconnor commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum commented Aug 9, 2022

Uh oh!

rustbot commented Aug 9, 2022

Uh oh!

This comment has been minimized.

nkconnor commented Aug 9, 2022

Uh oh!

nkconnor commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkconnor commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-log-analyzer commented Aug 9, 2022

Uh oh!

bors commented Aug 29, 2022

Uh oh!

JohnCSimon commented Nov 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

JohnCSimon commented Jul 24, 2022 •

edited

Loading

nkconnor commented Aug 9, 2022 •

edited

Loading

nkconnor commented Aug 9, 2022 •

edited

Loading

nkconnor commented Aug 9, 2022 •

edited

Loading