-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing a string method that works for iterators and generators #57180
base: master
Are you sure you want to change the base?
Conversation
base/strings/util.jl
Outdated
collected = collect(x) | ||
if !(isa(collected, AbstractVector) && all(x -> isa(x, Char), collected)) | ||
throw(MethodError(String, (x,))) | ||
end | ||
return String(collected::AbstractVector{<:AbstractChar}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot guarantee that collected
is an AbstractVector{<:AbstractChar}
, even if it only contains Char
.
Also, I'm not too fond of throwing a method error here. The method does exist, but the data may not be applicable. Consider an iterator that may return Char
or Nothing
. In this case, it will only throw a method error sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about collected = collect(Char, itr)
? That would avoid if
statement with the the all( isa)
call, which requires an extra wipe through the data.
The call to collect
guarantees an Array{Char,N}
return value, where N == 1
if the iterator is 1-dimensional, N == 0
, if the "iterator" is a number.
The String_iterator
and call of IteratorSize
would be avoided altogether by this:
String(itr) = String(collect(Char, itr))
String(x::AbstractArray{Char}) = throw(MethodError(String, (x, )))
base/strings/util.jl
Outdated
julia> String(Iterators.take("Hello, world", 5)) | ||
"Hello" # Takes the first 5 characters of the string and converts it to a string. | ||
""" | ||
String(x) = String_iterator(x, IteratorSize(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name String_iterator
does not follow the guidelines for Julia names. As you want to create an internal name (not exported), it could be _string_iterator
, for example.
- **String(x::AbstractIterator)** | ||
- Converts an iterator into a string. | ||
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator. | ||
- Ensures that the result is a valid string representation composed solely of characters (`Char`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make this a concise one-liner, which is more efficient than your version too (since it needs to make fewer intermediate copies):
String(x) = sprint(join, x)
- **String(x::AbstractIterator)** | |
- Converts an iterator into a string. | |
- Throws a `MethodError` if the iterator contains invalid data types (non-Char types) or if it is an infinite iterator. | |
- Ensures that the result is a valid string representation composed solely of characters (`Char`). | |
- **String(any_iterable)** | |
- prints an iterable object into a string using `join`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wait, do we want this? This has completely different semantics from the existing String
function. For example, with this change we get:
julia> String(Int8[101, 102])
"101102"
julia> String(UInt8[101, 102])
"ef"
To me this seems like the wrong meaning of the type constructor String
. This should only be used to convert things that are already a little bit 'string-like' and should not be conflated with print
, which this implementation suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. This seems more like write
then:
String(x) = sprint(io -> foreach(c -> write(io, c), x))
The print behavior was from looking at the implementation of String(::AbstractVector{<:AbstractChar})
, but that probably also can also be calling write
, since it is a case where print
is defined to be a call to write
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea, Examples Char(0x2ee8)
and String([0x2ee8])
should be consistent. Also the dependency on Litte/Big-Endiness is not expected.
I think the prototype should be like ... String(collect(Char, x))
as I mentioned before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so the problem with making just one liners is the error statements end up being loose for cyclic iterators, or even integers in some cases. So sticking with the ... String(collect(Char, x))
prototype.
new implementation without parsing twice
adding new tests for non-ascii characters
fixes #57072 , allowing queries like
String(Iterators.map(c->c+1, "Hello, world"))
to work. also adds tests for the same :)