-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading and saving a project with Xcodeproj breaks emoji characters in the project #196
Comments
Looks like this is down to Xcode's reading of XML plists (which includes xcproj conversion to ASCII since it uses Xcode's private API). Outside of scope for Xcodeproj. Convert ASCII plist to XML using plutil$ /usr/bin/plutil -convert xml1 Palaver.xcodeproj/project.pbxproj -o Palaver.xml The resulted XML document has XcodeReading this file back with Xcode results in the XML Unicode for 🚘 to get broken (as the above screenshot). xcprojUsing xcproj to touch the file results in the unicode 🚘 to become broken too. The result of saving the file as ASCII from XML changes the filename to üöò.m/ |
Seems like it's related to rdar://13565397 filed by @alloy |
😢 |
It looks like a different bug than rdar://13565397 to me. This time the issue seems to be Cocoa (through NSPropertyListSerialization) not properly decoding the escaped xml entity, i.e. Here is a snippet demonstrating the issue: for (NSString *emojiXMLPath in @[ @"emoji.xml", @"emoji_escaped.xml" ])
{
NSData *emojiXMLData = [NSData dataWithContentsOfFile:emojiXMLPath];
printf("%s\n", [emojiXMLData.description UTF8String]);
NSXMLDocument *document = [[NSXMLDocument alloc] initWithData:emojiXMLData options:0 error:NULL];
printf("NSXMLDocument: %s\n", [[[[[[document rootElement] childAtIndex:0] childAtIndex:1] stringValue] description] UTF8String]);
NSString *pythonScript = [NSString stringWithFormat:@"python -c \"import xml.etree.ElementTree as ET; print 'ElementTree: ' + ET.parse('%@').getroot()[0][1].text\"", emojiXMLPath];
system([pythonScript fileSystemRepresentation]);
NSDictionary *dictionary1 = [NSPropertyListSerialization propertyListWithData:emojiXMLData options:(NSPropertyListReadOptions)0 format:NULL error:NULL];
NSDictionary *dictionary2 = @{ @"emoji": @"\U0001F698" };
for (NSDictionary *dictionary in @[ dictionary1, dictionary2 ])
{
NSString *emoji = dictionary[@"emoji"];
NSData *emojiData = [emoji dataUsingEncoding:NSUTF8StringEncoding];
printf("%s: %s -> %s\n", [[emojiXMLPath stringByDeletingPathExtension] UTF8String], [emoji UTF8String], [emojiData.description UTF8String]);
}
printf("\n\n");
} emoji.xml: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>emoji</key>
<string>🚘</string>
</dict>
</plist> emoji_escaped.xml: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>emoji</key>
<string>🚘</string>
</dict>
</plist> And here is the corresponding output:
Decoding the xml file with python + xml.etree.ElementTree or NSXMLDocument works fine. Decoding the same file with NSPropertyListSerialization produces a wrong string: (or ef9a98 as UTF-8) instead of 🚘. Also, replacing 🚘 (U+1F698 / This leads me to think that NSPropertyListSerialization xml decoding is broken for entities in the Supplementary Multilingual Plane. I will further investigate this issue and report my findings here. |
Confirmed: this is a bug in NSPropertyListSerialization/CFPropertyList. The escaped xml entity parsing in CFPropertyList.c is flawed. It manually decodes the Obviously, 128664 doesn’t fit in a 16 bits variable. In retrospect, this was obvious: by looking at the wrong value produced by decoding the xml with NSPropertyListSerialization we see that (ef9a98 as UTF-8) is U+F698 which is just U+1F698 truncated to 16 bits! Unfortunately this CFPropertyList code is not fixable by swizzling since it’s inlined in the It looks like there is no simple solution to this problem since the escaping was introduced to workaround another bug in Xcode reading its project.pbxproj plist! For reference: c9baf1b is the commit that introduced the escaping. |
@0xced wow! Do you think we should radar that |
I reported it as rdar://18512876. Feel free to dupe. I have a workaround coming in |
Please try with the latest version of 0xced/xcproj@ca611b1 (on the develop branch). This will enable a custom XML plist parser (when needed) which doesn’t suffer from the NSPropertyListSerialization bug. This should solve this issue when using Note that the custom XML plist parser was not much tested with regards to data/date/number/boolean but Xcode projects don’t use these types anyway. |
This has been fixed by #203 💥 Fixture is here: https://github.com/CocoaPods/Xcodeproj/tree/master/spec/fixtures/Sample%20Project/Emoji.xcodeproj |
For the record, rdar://18512876 is fixed in OS X 10.11. |
Nice |
Before:
Xcodeproj:
After:
The text was updated successfully, but these errors were encountered: