Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing dates in different locales #638

Open
FrancescoMarzullo01 opened this issue Mar 5, 2023 · 7 comments
Open

Parsing dates in different locales #638

FrancescoMarzullo01 opened this issue Mar 5, 2023 · 7 comments

Comments

@FrancescoMarzullo01
Copy link

I have an RSS feed that contains lastBuildDate (feed) in rfc822 format and pubDate (entry) in Italian version of rfc822 format.

<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
	<lastBuildDate>Sun, 23 Mar 2022 16:22:36 GMT</lastBuildDate>
	<item>
		<title> </title>
		<pubDate>dom, 12 mar 2022 15:58:02 GMT</pubDate>
	</item>
	<item>
		<title> </title>
		<pubDate>dom, 11 mar 2022 13:58:02 GMT</pubDate>
	</item>
</channel>
<rss>

By default, the SyndFeedInput class uses the US locale. It is possible to set a different locale using the constructor
SyndFeedInput(final boolean validate, final Locale locale).

However, the problem is that the locale set for SyndFeedInput also affects SyndEntry, and there is no way to set the locale for SyndEntry separately.

One possible solution to this problem is to create two instances of SyndFeedInput, one with the default locale to parse lastBuildDate, and the other with the Italian locale to parse pubDate.

@PatrickGotthard
Copy link
Member

That's really weird. I think we don't have a solution for that 🤔

@neroux
Copy link
Contributor

neroux commented Mar 7, 2023

@FrancescoMarzullo01, are you sure a non-English date is RFC-compliant?

RFC 822 seems to suggest only English tokens are supported. Going by that, I'd venture to say your dates are not RFC-compliant I am afraid.

@FrancescoMarzullo01
Copy link
Author

A non-English date is not RFC-compliant, but Rome provides a date parser that parses all dates based on the locale of SyndFeedInput or WireFeedInput. It would be optimal to use this possibility locally where necessary.

One solution could be to specify additional locales within the Rome properties as "datatime.extra.locales"

DateParser.java

    public static Date parseDate(final String sDate, final Locale locale) {
    	Date date = null;
    	if (ADDITIONAL_MASKS.length > 0) {
    		date = parseUsingMask(ADDITIONAL_MASKS, sDate, locale);
    		if (date != null) {
    			return date;
    		}
    	}
        date = parseW3CDateTime(sDate, locale);
        if (date == null) {
            date = parseRFC822(sDate, locale);
        }
        return date;
    }

@neroux
Copy link
Contributor

neroux commented Mar 8, 2023

All right, fair enough, so you are essentially asking if the library could support custom date formats, right?

@FrancescoMarzullo01
Copy link
Author

Yes, exactly

@PatrickGotthard
Copy link
Member

PatrickGotthard commented Mar 11, 2023

We could also add locale support to datetime.extra.masks property like this:

datetime.extra.masks=<mask>|<mask>,de-DE|<mask>,en-EN|<mask>,fr-FR

When no locale is defined (first mask) we use en-US as default.

@neroux
Copy link
Contributor

neroux commented Mar 11, 2023

I understand @FrancescoMarzullo01's use-case, but, honestly, I'd argue the correct approach here would be to fix the feed and not make the library jump through hoops to accomodate non-standard formats. There's an RFC for a reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants