-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persian calendar conversion uses wrong algorithm #4713
Comments
Thank you @roozbehp for highlighting this. As you noted, Calendrical Calculations specifies two Persian calendar methods. The algorithm in section 15.2 uses astronomical calculations:
The other algorithm in 15.3 uses the 2820 cycle. I don't see one in the book that uses a pure 33-year cycle. Is the ICU implementation the standard for that? What are the other pros and cons of that approximation versus the 2820 approximation? When we implemented the calendrical calculations in ICU4X, we endeavored to follow the Reingold algorithms in order to not be the source of truth of any approximations. |
My understanding of the situation is: the Reingold text published an approximation for the Persian calendar, which we are calling the 2820 approximation, and that approximation is wrong in 2025 CE. @roozbehp suggests a different approximation, which works between 1925 and 2090 CE. Here's one possible approach we could take. In the Islamic and Chinese calendars, we implement a table-based fast path, where we ship precomputed year/month data for a certain range (I think +/- 100ish years by default), and fall back to the astronomical calculations for dates outside that range. We could move Persian over to using that framework, and throw out the 2820 approximation. What do you think @roozbehp @Manishearth @echeran? |
I ran a small test and confirmed that ICU4J does indeed give us Persian date 1403-12-30 for Gregorian date 2025-03-20, which matches the book's
|
Trying to answer the questions raised, as well as adding a few points:
|
IMO this is a misunderstanding of how these calendar algorithms work: For most of them the standard is whatever the calendar authorities1 say (often a proxy for what happens in the actual skies), because that's actually what people use on the ground. Aside from the Hebrew calendar2 most of these things have no "standard" algorithm and rather have a general set of rules. It is rare that the relevant calendar authorities will actually publish an algorithm, it is sometimes even possible that the authorities will use different algorithms at different times. This is also a reason as to why I'm not too bothered with ICU4X's Chinese calendar implementation diverging from ICU4C in a couple centuries. There's no right answer for those dates. Basically, whenever there is ground truth to follow, we should follow it.
This seems good. Footnotes
|
I apologize for sloppily using the word "standard." What I intended to ask was whether the ICU code is the source of truth for this approximation, as opposed to some textbook or wikipedia page or blog post Roozbeh made or something. I.e., what is the source that goes into the documentation. |
In general I prefer not implementing approximations in the ICU4X code and instead using speedups like we've done with the Chinese calendar. I consider it a bug that the Persian code currently uses an approximation, and the bug is worse given that the approximation fails next year, but any approximation is still a bug. We have a system for implementing fast, non-approximations for calendrical calculations, so why not use it. |
ICU4X Working Group discussion:
Conclusion: Shane's version:
LGTM: @sffc @Manishearth Manish's version:
Agreed: @Manishearth, @younies, @echeran |
Thanks a lot for the discussion and the detailed notes. I have a question: what is the theoretical date range we care about here? I have been thinking about compressed tables generated using the (slightly modified) astronomical algorithm from the Calendrical Calculations book (Persian calendar being relatively regular, we just need to keep leap year meta patterns). But I don't think the astronomical methods in the book return meaningful results beyond, say, 9999 CE (which is the end of ISO 8601), if that. Creating tables for all of i32 will take a lot of space and is probably noise anyways. |
BTW, since the Iranian calendar authority website is not always accessible from outside Iran (and it's in Persian anyways), I just created a data file based on it and put it at: |
I think as far in the past as that version of the calendar has been in use (so e.g. even though the Japanese calendar is old, the specific gregorian-based version they use now is relatively new), and my general take is 100 years into the future. |
I still don't know Rust, so I did a prototype for the replacement algorithm in Python. The code is very closely based on the Calendrical Calculations code, but with the 2820-year rule replaced by the 33-year rule and an override table. Its results match the (modified) astronomical algorithm from 1178 to 3000 AP (about 1799 to 3621 CE). If you don't need to go that much into the future, you can just drop some rows from the override table. The code is very short and fast. It just needs a set membership lookup, which I hope Rust can do cheaply. Here's the code (Apache-licensed, since it's based on the Calendrical Calculations code): https://github.com/roozbehp/persiancalendar/blob/main/persiancalendar_fast.py |
Ok, I couldn't sleep so I taught myself some Rust and created a pull request to replace the existing arithmetic rule with the one in ICU and ICU4J. This should fix the immediate concern about March 20, 2025 and will work correctly until the end of 1501 AP (~2122 CE). I can work on getting the override table in after that. |
I have a question. For distant dates, say 1000 or more years in the future, is there a drift between the 33-year and 2820-year cycles relative to what we predict to be the ground truth (the astronomical calculations)? For example, the 33-year cycle has a 100% accuracy rate for the next 100 years, and >90% for at least a few hundred years after that. We know that the 2820-cycle has a ~90% accuracy rate for the next 100 years. But 1000 years in the future, would the 2820 cycle hold its 90% accuracy rate while the 33 cycle drifts away? What I'm getting at is that we could perhaps keep both models and switch to 2820 over a certain bound. |
I did a quick comparison for the period 2403 AP (exactly 1000 years into the future) to 9378 AP (9999 CE, end of ISO 8601). The number of leap years that was wrong for the 33-year algorithm was 2511, compared to 2996 for the 2820-year algorithm. I also compared the period 1304 AP (introduction of the caldendar in 1925 CE) to 2403 AP (exactly a thousand years into the future). That came up with 70 wrong leap years for the 33-year rule and 205 wrong leap years for the 2820-year rule.
Looks like the 2820-year is just simply worse over any period we may care about, even for years far into the future. |
Thanks for the comparison @roozbehp!
By "wrong" I assume you mean relative to the astronomical calculation? If the calendar correctly determines a leap year, does it imply that the dates in that year (new year and month lengths) will be correct? |
Yes. The “modified” astronomical calculation (using the meridian the Iranian calendar authority uses). That's the only source of truth we have outside of the period that the Iranian calendar authority publishes.
Almost, but not exactly. If a calendar correctly determines a leap year, it would imply that the last day in that year and all the dates in the next year will be correct. It's a very good test for the Persian calendar, since by knowing all the leap years, you can quickly and unambiguously deduce everything else. In the Persian literature about the Persian calendar, certain years being leap or not is the shorthand for correctly calculating the calendar. |
Thanks everybody. I just filed https://unicode-org.atlassian.net/browse/ICU-22736 to sync ICU4C and ICU4J with this. |
Looking at https://github.com/unicode-org/icu4x/blob/main/components/calendar/src/persian.rs, it looks like the “proposed” 2820-year cycle arithmetic calendar has been implemented.
Unfortunately, that algorithm is wrong and fails in exactly a year from now, on March 20, 2025. While it should return 30 Esfand 1403 AP, it mistakenly returns 1 Farvardin 1404 AP.
The correct algorithm, as specified in Calendrical Calculations book, is the astronomical algorithm. That's what is followed on the ground (although with an unspecified location). Since that is too complex to implement, a simple arithmetic algorithm that works at least until ~2090 CE has been implemented in ICU4C and ICU4J. (See the comments at top of https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/util/PersianCalendar.java). That simple algorithm should be implemented in ICU4X as soon as possible, since we just entered the year 1403 AP, which is leap on the ground, but ICU4X miscalculates as non-leap.
Here's a quote from an email I sent to @echeran, @sffc, @Manishearth, @mihnita, and others on June 21, 2023:
The text was updated successfully, but these errors were encountered: