Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading and saving a project with Xcodeproj breaks emoji characters in the project #196

Closed
kylef opened this issue Sep 30, 2014 · 11 comments

Comments

@kylef
Copy link
Contributor

kylef commented Sep 30, 2014

Before:

screen shot 2014-09-30 at 17 02 42

Xcodeproj:

require 'xcodeproj'
project = Xcodeproj::Project.open('Palaver.xcodeproj')
project.save

After:

screen shot 2014-09-30 at 17 03 45

@kylef
Copy link
Contributor Author

kylef commented Sep 30, 2014

Looks like this is down to Xcode's reading of XML plists (which includes xcproj conversion to ASCII since it uses Xcode's private API). Outside of scope for Xcodeproj.

Convert ASCII plist to XML using plutil
$ /usr/bin/plutil -convert xml1 Palaver.xcodeproj/project.pbxproj -o Palaver.xml

The resulted XML document has F09F9A982E6D/:oncoming_automobile:.m for the filename, which is correct.

Xcode

Reading this file back with Xcode results in the XML Unicode for 🚘 to get broken (as the above screenshot).

xcproj

Using xcproj to touch the file results in the unicode 🚘 to become broken too.


The result of saving the file as ASCII from XML changes the filename to üöò.m/EFA3BFC3BCC3B6C3B22E6D (note !$*UTF8*$! is found in the header for the ASCII plist).

@kylef
Copy link
Contributor Author

kylef commented Sep 30, 2014

Seems like it's related to rdar://13565397 filed by @alloy

@fabiopelosin
Copy link
Member

😢

@0xced
Copy link
Contributor

0xced commented Sep 30, 2014

It looks like a different bug than rdar://13565397 to me. This time the issue seems to be Cocoa (through NSPropertyListSerialization) not properly decoding the escaped xml entity, i.e. 🚘.

Here is a snippet demonstrating the issue:

for (NSString *emojiXMLPath in @[ @"emoji.xml", @"emoji_escaped.xml" ])
{
    NSData *emojiXMLData = [NSData dataWithContentsOfFile:emojiXMLPath];
    printf("%s\n", [emojiXMLData.description UTF8String]);

    NSXMLDocument *document = [[NSXMLDocument alloc] initWithData:emojiXMLData options:0 error:NULL];
    printf("NSXMLDocument: %s\n", [[[[[[document rootElement] childAtIndex:0] childAtIndex:1] stringValue] description] UTF8String]);

    NSString *pythonScript = [NSString stringWithFormat:@"python -c \"import xml.etree.ElementTree as ET; print 'ElementTree: ' + ET.parse('%@').getroot()[0][1].text\"", emojiXMLPath];
    system([pythonScript fileSystemRepresentation]);

    NSDictionary *dictionary1 = [NSPropertyListSerialization propertyListWithData:emojiXMLData options:(NSPropertyListReadOptions)0 format:NULL error:NULL];
    NSDictionary *dictionary2 = @{ @"emoji": @"\U0001F698" };
    for (NSDictionary *dictionary in @[ dictionary1, dictionary2 ])
    {
        NSString *emoji = dictionary[@"emoji"];
        NSData *emojiData = [emoji dataUsingEncoding:NSUTF8StringEncoding];
        printf("%s: %s -> %s\n", [[emojiXMLPath stringByDeletingPathExtension] UTF8String], [emoji UTF8String], [emojiData.description UTF8String]);
    }
    printf("\n\n");
}

emoji.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>emoji</key>
        <string>🚘</string>
    </dict>
</plist>

emoji_escaped.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>emoji</key>
        <string>&#128664;</string>
    </dict>
</plist>

And here is the corresponding output:

<3c3f786d 6c207665 7273696f 6e3d2231 2e302220 656e636f 64696e67 3d225554 462d3822 3f3e0a3c 21444f43 54595045 20706c69 73742050 55424c49 4320222d 2f2f4170 706c652f 2f445444 20504c49 53542031 2e302f2f 454e2220 22687474 703a2f2f 7777772e 6170706c 652e636f 6d2f4454 44732f50 726f7065 7274794c 6973742d 312e302e 64746422 3e0a3c70 6c697374 20766572 73696f6e 3d22312e 30223e0a 093c6469 63743e0a 09093c6b 65793e65 6d6f6a69 3c2f6b65 793e0a09 093c7374 72696e67 3ef09f9a 983c2f73 7472696e 673e0a09 3c2f6469 63743e0a 3c2f706c 6973743e>
NSXMLDocument: 🚘
ElementTree: 🚘
emoji: 🚘 -> <f09f9a98>
emoji: 🚘 -> <f09f9a98>


<3c3f786d 6c207665 7273696f 6e3d2231 2e302220 656e636f 64696e67 3d225554 462d3822 3f3e0a3c 21444f43 54595045 20706c69 73742050 55424c49 4320222d 2f2f4170 706c652f 2f445444 20504c49 53542031 2e302f2f 454e2220 22687474 703a2f2f 7777772e 6170706c 652e636f 6d2f4454 44732f50 726f7065 7274794c 6973742d 312e302e 64746422 3e0a3c70 6c697374 20766572 73696f6e 3d22312e 30223e0a 093c6469 63743e0a 09093c6b 65793e65 6d6f6a69 3c2f6b65 793e0a09 093c7374 72696e67 3e262331 32383636 343b3c2f 73747269 6e673e0a 093c2f64 6963743e 0a3c2f70 6c697374 3e>
NSXMLDocument: 🚘
ElementTree: 🚘
emoji_escaped:  -> <ef9a98>
emoji_escaped: 🚘 -> <f09f9a98>

Decoding the xml file with python + xml.etree.ElementTree or NSXMLDocument works fine. Decoding the same file with NSPropertyListSerialization produces a wrong string:  (or ef9a98 as UTF-8) instead of 🚘.

Also, replacing 🚘 (U+1F698 / &#128664;) with ✋ (U+270B / &#9995;) works fine.

This leads me to think that NSPropertyListSerialization xml decoding is broken for entities in the Supplementary Multilingual Plane. I will further investigate this issue and report my findings here.

@0xced
Copy link
Contributor

0xced commented Oct 1, 2014

Confirmed: this is a bug in NSPropertyListSerialization/CFPropertyList.

The escaped xml entity parsing in CFPropertyList.c is flawed. It manually decodes the &#ddd; string and accumulates the ddd value into the num variable which is declared as uint16_t. 💣 GAME OVER 💣

Obviously, 128664 doesn’t fit in a 16 bits variable. In retrospect, this was obvious: by looking at the wrong value produced by decoding the xml with NSPropertyListSerialization we see that  (ef9a98 as UTF-8) is U+F698 which is just U+1F698 truncated to 16 bits!

Unfortunately this CFPropertyList code is not fixable by swizzling since it’s inlined in the parseStringTag C function.

It looks like there is no simple solution to this problem since the escaping was introduced to workaround another bug in Xcode reading its project.pbxproj plist! For reference: c9baf1b is the commit that introduced the escaping.

@segiddins
Copy link
Member

@0xced wow! Do you think we should radar that uint overflow?

@0xced
Copy link
Contributor

0xced commented Oct 1, 2014

I reported it as rdar://18512876. Feel free to dupe.

I have a workaround coming in xcproj, stay tuned.

@0xced
Copy link
Contributor

0xced commented Oct 1, 2014

Please try with the latest version of 0xced/xcproj@ca611b1 (on the develop branch). This will enable a custom XML plist parser (when needed) which doesn’t suffer from the NSPropertyListSerialization bug.

This should solve this issue when using xcproj to touch the Xcode project after saving as XML. Please test and report if it works as expected so that I can release a new version of xcproj.

Note that the custom XML plist parser was not much tested with regards to data/date/number/boolean but Xcode projects don’t use these types anyway.

@neonichu
Copy link
Member

@0xced
Copy link
Contributor

0xced commented Dec 4, 2015

For the record, rdar://18512876 is fixed in OS X 10.11.

@alloy
Copy link
Member

alloy commented Dec 4, 2015

Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants