Skip to content
This repository has been archived by the owner on Dec 7, 2018. It is now read-only.

Added UTF-8 support #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Added UTF-8 support #3

wants to merge 1 commit into from

Conversation

xrstf
Copy link

@xrstf xrstf commented Dec 12, 2011

I'm using FineDiff for a project using UTF-8, so I added basic support to it. The commit contains mb_strcspn() and mb_strspn() and a lot of reformatting, so you might only want to pull the additional functions and replace all string functions with their mb_ equivalents.

Speedwise it might be clever to create an extra UTF-8 version of the class, since the manually written mb_ functions won't be as fast as their PHP versions (and were, in fact, written to work instead of being fast).

@katanacrimson
Copy link

re: private method splittochars, why not use preg_split instead? If I remember right, it should work with unicode.

Also worth noting, it's considered bad form by many to submit PRs from your master branch, in case you weren't aware. Most devs prefer PRs sent from feature branches.

@xrstf
Copy link
Author

xrstf commented Dec 12, 2011

It should with the /u flag. Feel free to use that one. I didn't do an extensive search for my helpers, I just implemented what came first to my mind. :-)

$str = mb_substr($str, $start);
$dels = self::splitToChars($delimiters);

foreach ($dels as $idx => $del) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of array_walk() here with preg_quote would likely be faster in the end

@katanacrimson
Copy link

Kinda wish you'd separated the formatting into its own commit, that way I could look at just the commits that actually change stuff and see what you're doing, and see if there's any way I could help improve it at all :

E: GITHUB, STOP EATING MY BACKSLASHES. GRRRR.

@xrstf
Copy link
Author

xrstf commented Dec 12, 2011

Well I did the changes before even thinking about forking and sending a PR. Kinda sucks, I know, but a quick grep for 'str' should give you all the relevant positions in the code that need the 'mb_'. The additional stuff is just at the end.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants