Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BC dates #2

Open
garrettgman opened this issue May 27, 2009 · 8 comments
Open

BC dates #2

garrettgman opened this issue May 27, 2009 · 8 comments

Comments

@garrettgman
Copy link
Member

Remember to self: consider BC dates and time (Before Christ/ Before Common Era)

  • Garrett
@ianwyllie
Copy link

Hi. I know this is ancient, but is anyone thinking about this bce / b.c. Issue, or does lubridate now handle this natively?

@vspinu vspinu removed the Major label Dec 14, 2014
@vspinu
Copy link
Member

vspinu commented Dec 14, 2014

What's the problem more concretely. What's the user pattern people have in mind?

@vspinu
Copy link
Member

vspinu commented Sep 30, 2015

I am closing this. If someone insists that this should be done and is useful, please reopen.

@vspinu vspinu closed this as completed Sep 30, 2015
@niklaas
Copy link

niklaas commented Mar 4, 2018

Well, not that I use this regularly but I just worked on a dataset that had negative years i.e., BC, included and had quite some difficulties to deal with it. E.g. the following doesn't work:

> lubridate::ymd("-2255-01-01")
[1] "2255-01-01"

> lubridate::parse_date_time(-2255, "Y")
[1] NA
Warning message:
All formats failed to parse. No formats found.

since a Date object with positive year is returned. That said, the following works:

> lubridate::ymd("0000-01-01") - lubridate::years(2255)
[1] "-2255-01-01"

which made me write a helper function that deals with negative years.

@aurelberra
Copy link

I can confirm that for anyone working on ancient periods (e.g. Greek and Roman periods) this feature would be very useful. This problem/choice is currently a deal breaker for Tidyverse and lubridate enthusiasts using R in Digital Humanities projects and courses. I'd be happy to test the modified functions.

@tayflo
Copy link

tayflo commented Jan 1, 2021

Hi! Thanks for the package and the work done around. If someone considers implementing BCE dates, or dealing with those with the package as is (1.7.9), here are some thoughts about problems caused by a phantom year zero.

Dealing with (non existing) "Year zero"

Explanations

If anyone is considering dealing with "before common era" dates in lubridate, be aware that Year zero doesn't exist (for historians I mean, there is Year -1 and then Year 1, see for example the Wikipedia chronology – also note that's for the Julian calendar, that could have minor conflicts with the Gregorian calendar we use nowadays; you would find more details about this on Wikipedia), and that can cause a few problems.

Quick clarification for people unfamiliar with those notations:

  • "CE" stands for "Common Era", which is a "de-christianisation" of long and still used "AD", "Anno Domini" (so dates CE could be seen as "positive years")
  • "BCE" stands for "Before Common Era", equivalent to "BC", "Before Christ" ("negative years").
    eg. Year 2021 (happy new year btw!) would be "2021 CE". Socrates died in 399 BCE.

For instance, lubridate follows the ISO 8601 (version 8601:2004 I presume? BCE dates could be handled with ISO 8601:2019 but the free-access part of the doc is unclear about it), which starts at 0000-01-01, that is the 1st January of 1 BCE (Year -1).

This writing is confusing because it leaves to think "0000-01-01" is Year 0, and that "-001-01-01" is Year -1 when it's Year -2, and can cause problems to compute durations (see code below).

That aside, if encountered, "0 CE/AD" or "0 BCE/BC" should probably be parsed into Year -1.

References: Wikipedia (ISO 8601, Year zero, 1 BC, Common Era...)

Some code to make my point

(Licensed under WTFPL: Do What The Fuck You Want to)

pacman::p_load(lubridate)
pacman::p_version(lubridate)
#> [1] '1.7.9'

a <- ymd("0001-01-01")
a
#> [1] "0001-01-01"
# Year 1, no problem

b <- ymd("0000-01-01") - years(1)
b
#> [1] "-001-01-01"
# It is Year -1?
# No, it's -2 even if printed (-001-01-01),
# since ymd("0000-01-01") is already Year -1.

# The problem appears if we compute duration between the two
as.duration(a - b)
#> [1] "63158400s (~2 years)"
# But there is only one year between 1st January -1 and 1st January 1!
# since year zero doesn't exist.

Let's illustrate with Augustus dates:

  • birth: 23 September 63 BCE
  • death: 19 August 14 CE
  • age at death: 75
aug_birth <- ymd("0000-09-23") - years(63)
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2426889600s (~76.9 years)"
# That's one year too much!

# The correct writing would be:
aug_birth <- ymd("0000-09-23") - years(63 - 1)

So a correct helper function would be, to parse BCE yyyy-mm-dd:

parse_bce_ymd <- function(str) {
  regex <- "(\\d{4})(-\\d{2}-\\d{2})"
  match <- stringr::str_match(str, regex)
  years_n <- readr::parse_number(match[, 2]) - 1 # Beware the -1 here
  right_side <- match[, 3]
  date <- ymd(paste0("0000-",right_side)) - years(years_n)
  return(date)
}
# Test the function.
aug_birth <- parse_bce_ymd("0063-09-23")
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2395353600s (~75.9 years)"
# Yay that's correct!

Still, lubridate print the BCE date with one year less (less in absolute value, that is one year ahead here) than the "real one", as if a zero-year existed, which is misleading.

aug_birth
#> [1] "-062-09-23"

@aurelberra
Copy link

In view of the last comments (and the widespread use of R and Tidyverse packages in digital humanities projects), do you think the issue could be re-opened, @vspinu?

@vspinu vspinu reopened this Jan 24, 2021
@Jmuccigr
Copy link

If I understand what's going on correctly, there is a further problem in that lubridate (and R more generally) seems to use a pro-leptic Gregorian calendar. In other words the years divisible by 400 are not leap years even though in fact they were between 12 CE and the adoption of the Gregorian calendar in 1582.

For example, lubridate will report an error for February 29, 300, even though 300 was a leap year (though it wouldn't be in a Gregorian calendar):

> ymd("03000229")
[1] NA
Warning message:
 1 failed to parse. 

So will R:

> as.Date("300-02-29")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format
> as.Date("400-02-29")
[1] "0400-02-29"

You can also see this by running through Julian dates across the leap-day:

yr <- -609898
i <- 0
while (i < 4) {
    cat(yr + i,': ',as.character(as.Date(yr + i)),"\n")
    i <- i + 1
}
   
-609898 :  300-02-27 
-609897 :  300-02-28 
-609896 :  300-03-01 
-609895 :  300-03-02 

This means, for example, that you can't reliably get the difference between dates if you cross a 100-year boundary like 299-301. You also won't get correct days of the week for pretty much the entire Julian period (my issue right now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants