Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make chop and chomp return SubString #18339

Merged
merged 4 commits into from
Oct 8, 2016

Conversation

TotalVerb
Copy link
Contributor

This is unfortunate:

julia> const str = "x" ^ 100000 * "\n";

julia> @benchmark split(str, '\n')[1]
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  192.00 bytes
  allocs estimate:  4
  minimum time:     2.99 μs (0.00% GC)
  median time:      3.21 μs (0.00% GC)
  mean time:        3.26 μs (0.00% GC)
  maximum time:     8.44 μs (0.00% GC)

julia> @benchmark chomp(str)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  97.78 kb
  allocs estimate:  3
  minimum time:     9.35 μs (0.00% GC)
  median time:      10.81 μs (0.00% GC)
  mean time:        13.47 μs (10.56% GC)
  maximum time:     790.80 μs (94.19% GC)

I suspect that the same reason that split returns SubString applies to chop and chomp also. The clear benefit of returning a view is performance and allocation avoidance; the disadvantages are that the underlying string cannot be freed, and accessing a SubString is slightly slower. However, in chop and chomp's case, the disadvantage is incredibly slight—an extra byte or so that can't be freed—so I believe it would make sense to follow split here.

Note that Perl, the only other language I know with chop and chomp, has these as in-place operations—which is even faster than creating a SubString, but not appropriate for Julia's conventionally immutable Strings.

On another note, I wonder whether it might be worthwhile to make String into String{T}, so that a string's underlying data can be an array view—which would make SubString substantially faster.

@stevengj stevengj added the strings "Strings!" label Sep 3, 2016
@nalimilan
Copy link
Member

I think @StefanKarpinski has a plan regarding SubString and strings sharing the same underlying array.

@stevengj
Copy link
Member

stevengj commented Sep 5, 2016

LGTM (needs a rebase).

@TotalVerb
Copy link
Contributor Author

TotalVerb commented Sep 5, 2016

Rebased. On another note, I noticed the doctest:

julia> a = string("March")
"March"
julia> chop(a)
"Marc"

seems to use string("March") for no apparent reason. I think this is a little misleading since it seems to imply copying the string is needed to use chop. I can fix this in a separate PR.

@tkelman
Copy link
Contributor

tkelman commented Sep 6, 2016

The return type should be tested for if you don't want it to regress accidentally. The string call is probably a bad find-replace from utf8.

@tkelman tkelman added the needs tests Unit tests are required for this change label Sep 6, 2016
@stevengj
Copy link
Member

stevengj commented Sep 6, 2016

I would just fix the doctest in this PR, @TotalVerb.

@StefanKarpinski
Copy link
Member

LGTM as well.

@TotalVerb
Copy link
Contributor Author

I made the doctest fix locally. Do I need to rebuild helpdb or something? @stevengj

@stevengj
Copy link
Member

stevengj commented Sep 6, 2016

You change the docstring in the source, run ./julia doc/genstdlib.jl to re-generate the manual, and then commit the changes.

@TotalVerb
Copy link
Contributor Author

OK, is this all right? Thanks for the help, @stevengj.

@stevengj
Copy link
Member

stevengj commented Sep 6, 2016

LGTM.

@tkelman tkelman removed the needs tests Unit tests are required for this change label Sep 7, 2016
"March"

julia> chop(a)
"Marc"
```
"""
chop(s::AbstractString) = s[1:end-1]
chop(s::AbstractString) = SubString(s, 1, endof(s)-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the doctests still pass with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doctests don't pass, but I don't know if that's this change's fault. Failures seem unrelated to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like most of the failures are due to line numbers being not the same as expected?

https://gist.github.com/TotalVerb/d81e7fff617d315f51ecd019aac4f553

@TotalVerb
Copy link
Contributor Author

Bump.

@@ -81,14 +80,21 @@ Remove a single trailing newline from a string.
"""
function chomp(s::AbstractString)
i = endof(s)
if (i < 1 || s[i] != '\n') return s end
if (i < 1 || s[i] != '\n') return SubString(s, 1, i) end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very important but I think that typically we use short circuiting ... && return ... for this. But it was like this before the PR too.

@KristofferC
Copy link
Member

This got lgtm from both Stefan and Steven so should be OK to merge?

@StefanKarpinski StefanKarpinski merged commit e94007e into JuliaLang:master Oct 8, 2016
Copy link
Member

@kmsquire kmsquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

News item?

@kmsquire
Copy link
Member

kmsquire commented Oct 8, 2016

Sorry, that should have been a comment.

@TotalVerb
Copy link
Contributor Author

Sure, I'll write up a NEWS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants