Skip to content

Commit

Permalink
Improve performance of single-character string splitting.
Browse files Browse the repository at this point in the history
Helps #661.
  • Loading branch information
StefanKarpinski committed Apr 2, 2012
1 parent 96c5328 commit 6bc8a03
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions base/regex.jl
Original file line number Diff line number Diff line change
Expand Up @@ -130,12 +130,14 @@ function split(s::String, regex::Regex, include_empty::Bool, limit::Integer)
end

split(s::String, x::String, incl::Bool, limit::Integer) =
strwidth(x) == 1 ? split(s, x[1], incl, limit) :
split(s, Regex(strcat("\\Q",x)), incl, limit)

split(s::String, regex::Regex, include_empty::Bool) =
split(s, regex, include_empty, 0)

split(s::String, x::String, incl::Bool) =
strwidth(x) == 1 ? split(s, x[1], incl) :
split(s, Regex(strcat("\\Q",x)), incl)

replace(s::String, regex::Regex, repl::String, limit::Integer) =
Expand All @@ -145,7 +147,9 @@ replace(s::String, regex::Regex, repl::String) =
join(split(s, regex, true, 0), repl)

replace(s::String, x::String, repl::String, limit::Integer) =
strwidth(x) == 1 ? replace(s, x[1], repl, limit) :
replace(s, Regex(strcat("\\Q",x)), repl, limit)

replace(s::String, x::String, repl::String) =
strwidth(x) == 1 ? replace(s, x[1], repl) :
replace(s, Regex(strcat("\\Q",x)), repl)

4 comments on commit 6bc8a03

@JeffBezanson
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think strwidth is the right function here; probably strlen.

@StefanKarpinski
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a second to remember why I used strwidth: I wanted to only call the single character version if strlen(x)==1 and x[1] is an ASCII character. I realize now that testing those together is better than using strwidth, which will scan the entire string needlessly. Even better would be something like strstr, except that it doesn't handle NULs correctly. Probably best to use memchr to find the first character and then just use a look to check if the rest of the string matches.

@JeffBezanson
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strwidth doesn't do that; it gives the number of columns needed to display the string.

@StefanKarpinski
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. That's also true. length(x) == 1 would be both faster and actually correct.

Please sign in to comment.