Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-447] Parsing strings into floating-point performs badly #43064

Closed
lilyball mannequin opened this issue Jan 3, 2016 · 9 comments
Closed

[SR-447] Parsing strings into floating-point performs badly #43064

lilyball mannequin opened this issue Jan 3, 2016 · 9 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Double Area → standard library: The `Double` type performance standard library Area: Standard library umbrella String Area → standard library: The `String` type

Comments

@lilyball
Copy link
Mannequin

lilyball mannequin commented Jan 3, 2016

Previous ID SR-447
Radar rdar://problem/23114536
Original Reporter @lilyball
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug, Performance
Assignee None
Priority Medium

md5: 9c2a4c230f784781f1a8d22b417b5029

relates to:

Issue Description:

The implementations of init?(_ text: String) for the various floating-point formats relies on calling strtod_l (or the appropriate variant thereof). This requires calling String.withCString, which makes a copy of the string (so it can add the NUL terminator). This means that every single parse of a floating-point number incurs an allocation and copy, which can be prohibitively expensive in a tight loop.

@lhoward
Copy link
Contributor

lhoward commented Jan 3, 2016

@lilyball
Copy link
Mannequin Author

lilyball mannequin commented Jan 3, 2016

For reference, Rust's floating-point parser is an implementation of the 1990 paper How to Read Floating Point Numbers Accurately by William D. Clinger, which might be a good resource for implementing our own real parser.

@gottesmm
Copy link
Contributor

gottesmm commented Jan 3, 2016

In other lives I have looked into this and from what I understand grisu is the best solution but is more complicated to implement, while dragon4 is simple and in most cases is "good" enough.

We should ask Steve Canon what he thinks about this.

EDIT: I just pinged him.

@lilyball
Copy link
Mannequin Author

lilyball mannequin commented Jan 3, 2016

@gottesmm Aren't those algorithms for converting from floating point -> string? We're talking here about converting from string -> floating point.

Granted, we probably should investigate floating point -> string as well. What Swift does right now isn't great (which is to say, it doesn't produce the shortest possible string that unambiguously refers to a given floating-point value).

Incidentally, a while ago Rust switched to using a combination of Grisu3 and Dragon4 (comments say Grisu3 is exact and fast but incomplete, whereas Dragon4 is complete but slow, so I assume it starts with Grisu3 and falls back to Dragon4 when necessary). You can see this in rust-lang/rust#24612. There's also some more info in the related issue rust-lang/rust#24556.

There's another algorithm that's mentioned there called Martin Gay's algorithm which is apparently what Python uses. I think it's supposed to be good, but it's apparently incredibly complex, and it requires allocation (which disqualifies it from Rust as Rust's float -> string routines live in libcore, which doesn't have allocation). I don't know how its speed compares to the others, but since it is apparently so complex, unless we want to take the reference C implementation and use it as-is, it's probably a lot easier to stick with Grisu3 + Dragon4 as Rust did.

I'm going to file a separate issue on this.

Edit: Looking around for references, I actually can't find an algorithm by a "Martin Gay", but I did find one that sounds like the same thing by a "David M. Gay".

@lilyball
Copy link
Mannequin Author

lilyball mannequin commented Jan 3, 2016

Filed #43071 to cover float -> string conversion.

@gottesmm
Copy link
Contributor

gottesmm commented Jan 3, 2016

Doing grisu and falling back to dragon4 sounds right to my memory. TBH I think that this bug report should handle the whole back/forth. We want someone to do this work together as one before we say that we are done.

@lilyball
Copy link
Mannequin Author

lilyball mannequin commented Jan 3, 2016

They're related works, but are the parsing algorithms actually related to the formatting algorithms in any way? I know the dtoa library offers both, but I imagine that's just because that means it's a one-stop shop for float <-> string conversions.

I agree that it would be great to get both done, but I don't think they need to be done at the same time. Which is to say, parsing string -> float shouldn't block float -> string, or vice versa.

@gottesmm
Copy link
Contributor

gottesmm commented Jan 3, 2016

I think we are having a philosophical debate here for which there is no correct answer.

Lets leave it up to whomever decides to implement this.

@tbkka
Copy link
Contributor

tbkka commented Mar 16, 2018

I don't see any problem with the Swift standard library invoking `strtod` from the C standard library. strtod is generally quite good; I believe most libc implementations nowadays use David M. Gay's implementation of Clinger's algorithm (part of his dtoa library) which is both guaranteed accurate and usually quite fast.

However, using `String.withCString` is indeed a lot of overhead, so I would focus on that. Since the strings in question are almost always quite short, copying them should not be a performance problem per se. The most likely culprits are the heap allocations and the character-set conversion. I would suggest allocating a small buffer on the stack and trying to translate the string to ASCII there. Only if that fails (because the string is longer than expected) would I fall back to `withCString`. This approach would eliminate heap allocation in the usual case.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@AnthonyLatsis AnthonyLatsis added String Area → standard library: The `String` type Double Area → standard library: The `Double` type labels May 26, 2023
@Azoy Azoy closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Double Area → standard library: The `Double` type performance standard library Area: Standard library umbrella String Area → standard library: The `String` type
Projects
None yet
Development

No branches or pull requests

5 participants