You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XML::Atom has a bizarre API where by default, text is returned as a string of
UTF-8 bytes without the Unicode flag set. XML::RSS::Feed doesn't do this.
To make the output of XML::Feed the same in both cases, XML::Feed should
probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" around each read
access to the XML::Atom object's accessor functions, resulting in a
switch to Unicode output that matches XML::RSS::Feed.
This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
feeds; it ends up "double-escaping" the entries as they're written into the
cache. For instance, U+8217 closing single quote goes into the cache file as
the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", rather than the correct 3-byte
sequence "\xE2\x80\x99"; the effect is as if the string was encoded as
UTF-8, decoded as Latin-1, then encoded as UTF-8 again.
Simon
I too have the same problem. And setting $XML::Atom::ForceUnicode = 1;
fixes this for me. But I'm afraid that it's a global variable and I
can't set it in my module AnyEvent::Feed which uses XML::Feed.
Greetings,
Robin
Hmm, I'm not entirely sure what the best way to handle this is - setting
ForceUnicode is kind of a nuclear option which could screw up other
modules in, say, a mod_perl environment.
I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
On Mon Nov 16 21:02:41 2009, SIMONW wrote:
> Hmm, I'm not entirely sure what the best way to handle this is - setting
> ForceUnicode is kind of a nuclear option which could screw up other
> modules in, say, a mod_perl environment.
>
> I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
I discovered this solution myself. I'd love to see XML::Atom have an object attribute to force
decoding to utf8. Frankly, it should be enabled by default.
Best,
David
Hi all,
I've been bitten by this bug myself now when trying to combine my
blogs.perl.org's blog feed, which is only provided in Atom (why??), into
the rest of the feeds. The ForceUnicode setting workaround that is
described in this thread works nicely, but there should be a more
permanent solution.
Regards,
-- Shlomi Fish
On Tue Feb 03 14:32:07 2009, smcv@debian.org wrote:
> XML::Atom has a bizarre API where by default, text is returned as a
> string of UTF-8 bytes without the Unicode flag set. XML::RSS::Feed
> doesn't do this.
>
> To make the output of XML::Feed the same in both cases, XML::Feed
> should probably use "{ local $XML::Atom::ForceUnicode = 1; ... }"
> around each read access to the XML::Atom object's accessor functions,
> resulting in a switch to Unicode output that matches XML::RSS::Feed.
>
> This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
> feeds; it ends up "double-escaping" the entries as they're written
> into the cache. For instance, U+8217 closing single quote goes into
> the cache file as the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99",
> rather than the correct 3-byte sequence "\xE2\x80\x99"; the effect is
> as if the string was encoded as UTF-8, decoded as Latin-1, then
> encoded as UTF-8 again.
>
> Simon
Does it make sense to discuss this here? Isn't it a bug in XML::Atom?
Or am I misunderstanding?
Dave...
On Thu, 24 Nov 2011 at 06:37:43 -0500, Dave Cross via RT wrote:
> Does it make sense to discuss this here? Isn't it a bug in XML::Atom?
>
> Or am I misunderstanding?
I agree that this needs discussion with the author of XML::Atom. I don't
know how you Cc people "correctly" in RT, it's not a bug tracker I'm
particularly familiar with.
As far as I'm concerned, the bug in X::F is that it doesn't produce the
same data type for RSS and Atom feeds (breaking encapsulation), and the
underlying bugs in X::A that make it hard for X::F to do the right
thing are:
1) produces a byte-string of UTF-8, rather than a Unicode string, by default
(might not be considered to be a bug, since it's documented in
XML::Atom::Feed; or might be considered to be a bug but unfixable, since
that would be an API break)
2) can only be directed to produce Unicode by setting a global variable
(this is an API design problem, rather than not behaving as documented)
Three possible solutions:
* If (1) is considered to be a bug, make XML::Atom::ForceUnicode the default,
and XML::Feed doesn't need any changes; requires changes to X::A only.
* If (1) is as designed or is unfixable, fix (2) instead (e.g. add
$feed->unicode(1) setter) and then change XML::Feed to use it; requires
changes to both X::A and X::F. I'd be inclined to say this one is the
most correct.
* If (1) is as designed, postprocess the XML::Atom output through
Encode::decode('utf-8', $bytes) in XML::Feed; requires changes to X::F only,
but will break if (1) is changed in a later version of X::A.
Which one is correct is up to you and the author of XML::Atom.
For now, IkiWiki sets "local $XML::Atom::ForceUnicode = 1" around each
invocation of XML::Feed, because we know that it's single-threaded, so the
usual problems with global variables are less of a concern. I realise this
would be unacceptable in a library, though.
S
The text was updated successfully, but these errors were encountered:
This is a pretty interesting discussion but also pretty old, need to check how XML::Atom behaves at this day.
Probably the best way to deal with that without braking existing code would be to provide one way to set unicode for XML::Feed, so XML::Feed will set local $XML::Atom::ForceUnicode = 1 when needed if run in unicode mode.
Migrated from rt.cpan.org#43004 (status was 'open')
Requestors:
Attachments:
From smcv@debian.org on 2009-02-03 19:32:07
:
From elmex@ta-sa.org on 2009-07-07 12:52:13
:
From simonw@cpan.org on 2009-11-17 02:02:41
:
From dwheeler@cpan.org on 2010-05-20 19:18:50
:
From shlomif@cpan.org on 2011-11-24 11:28:26
:
From davecross@cpan.org on 2011-11-24 11:37:42
:
From smcv@debian.org on 2011-11-24 12:01:22
:
The text was updated successfully, but these errors were encountered: