Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make replace with count=0 a no-op #22325

Merged
merged 13 commits into from
Jul 4, 2017
Merged

make replace with count=0 a no-op #22325

merged 13 commits into from
Jul 4, 2017

Conversation

rfourquet
Copy link
Member

Before we had replace("aaa", 'a', 'z', 0) == "zzz" which can seem unintuitive. This PR changes this to a no-op, while still allowing a negative count to mean unlimited replacements.

@@ -401,7 +402,7 @@ If `pat` is a regular expression and `r` is a `SubstitutionString`, then capture
references in `r` are replaced with the corresponding matched text.
"""
replace(s::AbstractString, pat, f, n::Integer) = replace(String(s), pat, f, n)
replace(s::AbstractString, pat, r) = replace(s, pat, r, 0)
replace(s::AbstractString, pat, r) = replace(s, pat, r, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably mention this in the docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok (Note that it's not mentioned in Python's docstring either)

Copy link
Contributor

@tkelman tkelman Jun 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is python relevant? The docstring above says "If n is provided, replace at most n occurrences." So if there's a special meaning to negative inputs, best to describe that. We don't do as many sign-based behavior changes as Python does, like negative indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was not so relevant, only as a mature language which doesn't mention the behavior for negative inputs, which is same as here, i.e. replace all. Will add that.

@rfourquet
Copy link
Member Author

I realize that the default value of 0 was mentioned in the docstring, and that this is a breaking change (even if it was not said explicitly that a 0 meant change all). Would a note in the "NEWS.md" be enough?

@ararslan ararslan added breaking This change will break code needs news A NEWS entry is required for this change labels Jun 10, 2017
@DrKrar
Copy link
Contributor

DrKrar commented Jun 10, 2017

i like to suggest anther behavior.
n: taken to be number of maximum replacement except n=0 only would be equivalent to replace all or n= Inf.
negative value of n do reverse replacement for max of abs(n). i.e. from the end.

@rfourquet
Copy link
Member Author

@DrKrar that could be a good idea, except that I prefer that n=0 means no replacement: even if when called by hand, no-one would use this special case, "it can be useful if n is the result of a computation", as @nalimilan said.
Also, I think Julia in general uses a different named function (whith last in the name) to start an operation from the end of a collection.

@@ -359,6 +359,7 @@ _replace(io, repl::Function, str, r, pattern) =
print(io, repl(SubString(str, first(r), last(r))))

function replace(str::String, pattern, repl, limit::Integer)
limit == 0 && return str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, maybe rename the argument so that it's consistent with the docstring (or change the docstring)?

@nalimilan
Copy link
Member

Actually, I wonder whether we couldn't use typemax(Int) as a default rather than -1. That would be a more natural extension of the definition "replace up to n occurrences", and wouldn't require any special explanation in the docstring. Negative values could then be an error, which could catch bugs (just like it works with indexing).

@rfourquet
Copy link
Member Author

rfourquet commented Jun 11, 2017

I really like the idea of typemax(Int), will update accordingly (converting this parameter to Int when calling the main method (which does the work) can also limit the number compiled functions).
Not sure if negative value should be error or similar to the zero case.

@rfourquet rfourquet removed the needs news A NEWS entry is required for this change label Jun 11, 2017
@rfourquet
Copy link
Member Author

I added a deprecation warning when count is negative (seems more safe) and a NEW entry.

@nalimilan
Copy link
Member

I don't think it's worth all the additional complexity of the deprecation, especially because it involves adding lots of code outside of deprecated.jl, which is unusual. n < 0 was a no-op anyway, and its' not even like it could be considered as an extension of the n == 0 case since 0 was used for "unlimited". So let's just note that this previously undocumented behavior is now an error. People may even be grateful that this caught bugs that went unnoticed.

@rfourquet
Copy link
Member Author

I was unprecise, the deprecation I added concerns also n==0, which is breaking otherwise, and can go unnoticed. Do you propose to error on n<=0 (and re-enable n==0 in a later release) or n<0?

@@ -358,7 +358,10 @@ _replace(io, repl, str, r, pattern) = print(io, repl)
_replace(io, repl::Function, str, r, pattern) =
print(io, repl(SubString(str, first(r), last(r))))

function replace(str::String, pattern, repl, limit::Integer)
function replace_new(str::String, pattern, repl, count::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why restrict this to Int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what's done in other contexts too: accept Integer at the API level, but convert to Int in the method which does the work, a readons could be to limit the number of compiled functions. But I can see that there is no reason to error out if e.g. a very big BigInt is passed. I could use clamp(n, 0, typemax(Int) instead of Int(n) in the caller (which is in deprecated now), assuming it's never necessary to replace more than typemax(Int) times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that once the deprecation is removed, the "API level" will be this function, and only Int will be accepted... So if you want this you need to add another layer which will remain. But really I'm not sure it's worth it, e.g. search doesn't use that trick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. But this replace_new also only accepts a String, not an AbstractString, so this is the replace(s::AbstractString, pat, f) = replace_new(String(s), pat, f, typemax(Int)) (still in strings/util.jl, below replace_new) which will have to be modified to
replace(s::AbstractString, pat, f, count::Integer=typemax(Int)) = replace(String(s), pat, f, clamp(count, typemin(Int), typemax(Int)) % Int).

@@ -1417,6 +1417,20 @@ function LibGit2.set_remote_url(path::AbstractString, url::AbstractString; remot
LibGit2.set_remote_url(path, remote, url)
end

function replace(s::AbstractString, pat, f, n::Integer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a comment explaining that replace_new needs to be renamed to replace when removing this deprecation (and make sure tests will fail if that's not the case).

replace(s::AbstractString, pat, f, n::Integer) = replace(String(s), pat, f, n)
replace(s::AbstractString, pat, r) = replace(s, pat, r, 0)
replace(s::AbstractString, pat, f) = replace_new(String(s), pat, f, typemax(Int))
# change this to the following when `replace` is removed from deprecated:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this kind of comment is that nobody is going to notice it when removing deprecation. I'd say let's go with ::Integer, and once the deprecation has been removed, feel free to open a PR to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't been addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's the second time since yesterday that my comment is not sent, with a "You can't perform this action at this time" message. My last unsent message was:

I added GenericString to the tests of replace, so if the deprecation is removed without taking this comment into account, the tests will fail. What I can also do is to open a PR to remove the deprecation right after we merge this.

IOW, the comments cannot be silently ignored and it would be wrong to ignore it even if there was not this conversion-to-Int change (and I can open the subsequent PR in advance for the next release), but I can update if you still prefer that I postpone this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'd rather leave this out, but let's hear what others think about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I left it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a TODO note in base/deprecated.jl referring to this and saying that it should be removed. That way there's a trail of breadcrumbs when doing the normal deprecations deletion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was already instructions in "base/deprecated.jl", but I updated to be even more clear.

depwarn(string("`replace(s, pat, r, count)` with `count <= 0` is deprecated, use ",
"`replace(s, pat, r, typemax(Int))` or replace(s, pat, r)` instead"),
:replace)
Base.replace(s, pat, f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base. not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok removed.

function replace(s::AbstractString, pat, f, n::Integer)
if n <= 0
depwarn(string("`replace(s, pat, r, count)` with `count <= 0` is deprecated, use ",
"`replace(s, pat, r, typemax(Int))` or replace(s, pat, r)` instead"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing opening backtick.

replace(s::AbstractString, pat, f, n::Integer) = replace(String(s), pat, f, n)
replace(s::AbstractString, pat, r) = replace(s, pat, r, 0)
replace(s::AbstractString, pat, f) = replace_new(String(s), pat, f, typemax(Int))
# change this to the following when `replace` is removed from deprecated:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't been addressed.

@nalimilan
Copy link
Member

Looks good to me, modulo this discussion on which I'd like more opinions.

:replace)
replace(s, pat, f)
else
Base.replace_new(String(s), pat, f, clamp(n, 0, typemax(Int)) % Int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base. shouldn't be needed here either, deprecated.jl is included within the Base module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I should have checked

@rfourquet rfourquet force-pushed the rf/str-replace-0count branch 2 times, most recently from 6e12155 to d271b78 Compare June 21, 2017 10:54
@rfourquet
Copy link
Member Author

Rebased, I will merge in a few days if no objection.

@nalimilan
Copy link
Member

If we go with a keyword argument in #22324, should we also do the same here? A deprecation is painful enough that we don't want to repeat this process later. Would need to check whether the keyword argument would hurt performance.

@rfourquet
Copy link
Member Author

I would say yes we should. But I don't see how to avoid performances loss with a keyword (when the keyword is not used, performance is not hurt, and using the keyword costs about 90ns on my machine, which represents about 50% overhead for a string of length 10). If keyword uses are really expected to become faster, then a temporary slow down may be acceptable (using the count argument is probably rare).

@nalimilan
Copy link
Member

OK, let's go with this then, a 50% overhead is clearly too much.

@rfourquet
Copy link
Member Author

50% overhead is clearly too much.

What about for non-string arguments then? would you say the slowness of using keywords is more acceptable because at least it's not a regression (as it didn't exist before)?

I feel that if the API for replace on non-string is decided with pairs and keywords (as it's now in #22324), then eventually the string version should accept a similar API.

@nalimilan
Copy link
Member

It's hard to tell, but I would think that strings are generally smaller than arrays, so the overhead is worse. Also there's a real advantage to using a keyword argument for the array case (due to the pair API), but not really for strings.

@rfourquet
Copy link
Member Author

I was thinking of adding the pair API to strings too!

@nalimilan
Copy link
Member

Why not, that's good for consistency, but we can keep the existing one if it's more efficient.

@rfourquet rfourquet force-pushed the rf/str-replace-0count branch from d271b78 to 4405d63 Compare July 4, 2017 10:54
provided, replace at most `n` occurrences. As with search, the second argument may be a
Search for the given pattern `pat` in `s`, and replace each occurrence with `r`.
If `count` is provided, replace at most `count` occurrences.
As with search, the second argument may be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search would be good as a cross reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rfourquet
Copy link
Member Author

CI failure seems unrelated: it concerns libgit2, and a similar problem appears in other unrelated PRs.

@rfourquet rfourquet changed the title replace with count=0 is a no-op make replace with count=0 a no-op Jul 4, 2017
@rfourquet rfourquet merged commit bf83397 into master Jul 4, 2017
@rfourquet rfourquet deleted the rf/str-replace-0count branch July 4, 2017 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants