Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stamped date format using lubridate is selecting the wrong fixed format #545

Closed
ekstroem opened this issue May 21, 2017 · 6 comments
Closed

Comments

@ekstroem
Copy link
Contributor

I am trying to format date output using stamp from the lubridate package. I would like the end format to be similar to

Sunday, November 1, 23:15

My problem is getting the unabbreviated month to be printed

library(lubridate)
x <- ymd_hm("2017-11-20 15:15")

Now if I use stamp as below then I almost get the right output

stamp("Sunday, November 30, 23:15")(x)
Multiple formats matched: "%A, %B %d, %y:%H"(1), "%A, %B %y, %d:%H"(1), "Sunday, %B %d, %y:%H"(1), "Sunday, %B %y, %d:%H"(1)
Using: "%A, %B %d, %y:%H"
[1] "Monday, November 20, 17:15"

However, November is interpreted as fixed text and the guessed format uses the year and hour in place of the hour and minutes. If I add the orders argument then I can try to force the order of the inputs

stamp("Sunday, Nov 30, 23:15", orders="AbdHM")(x)
Multiple formats matched: "%A, %Om %d, %H:%M"(0), "%A, %b %d, %H:%M"(1)
Using: "%A, %Om %d, %H:%M"
[1] "Monday, 11 20, 15:15"

Here the output is correct except I get the month as a number and not as text. Note that stamp does provide the correct format but ends up using the not-quite-correct-format despite being given an identical match.

What can I do to force stamp to use the exact format that I supplied?

[There are other options than using stamp. I just cannot figure out why the code above isn't working]

@cderv
Copy link
Contributor

cderv commented May 22, 2017

Here is my understanding of the situation base on some small test.
I use last dev version of lubridate

library(lubridate)
#> 
#> Attachement du package : 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
devtools::session_info("lubridate")$packages
#>  package   * version    date       source                           
#>  lubridate * 1.6.0.9009 2017-05-22 Github (hadley/lubridate@2608730)
#>  magrittr    1.5        2014-11-22 CRAN (R 3.3.2)                   
#>  Rcpp        0.12.10    2017-03-19 CRAN (R 3.3.3)                   
#>  stringi     1.1.5      2017-04-07 CRAN (R 3.3.3)                   
#>  stringr     1.2.0      2017-02-18 CRAN (R 3.3.2)

I choose another month than the stamp one. here October and save my locale LC_TIME setting for later

x <- ymd_hm("2017-10-20 15:15")
(ori <- Sys.getlocale("LC_TIME"))
#> [1] "French_France.1252"

Results is depending on my locale setting

# In French as my locale
stamp("Dimanche, Novembre 30, 23:15")(x)
#> Multiple formats matched: "%A, %B %d, %y:%H"(1), "%A, %B %y, %d:%H"(1), "Dimanche, %B %d, %y:%H"(1), "Dimanche, %B %y, %d:%H"(1)
#> Using: "%A, %B %d, %y:%H"
#> [1] "vendredi, octobre 20, 17:15"
# In English, different from my locale
stamp("Sunday, November 30, 23:15")(x)
#> Error in stamp("Sunday, November 30, 23:15"): Couldn't guess formats of: Sunday, November 30, 23:15
# Setting stamp with local English
stamp("Sunday, November 30, 23:15", locale = "English")(x)
#> Multiple formats matched: "%A, %B %d, %y:%H"(0), "%A, %B %y, %d:%H"(0), "Sunday, %B %d, %y:%H"(0), "Sunday, %B %y, %d:%H"(0)
#> Using: "%A, %B %d, %y:%H"
#> [1] "vendredi, octobre 20, 17:15"
# Changing completely my Locale
Sys.setlocale("LC_TIME", "English")
#> [1] "English_United States.1252"
stamp("Sunday, November 30, 23:15")(x)
#> Multiple formats matched: "%A, %B %d, %y:%H"(1), "%A, %B %y, %d:%H"(1), "Sunday, %B %d, %y:%H"(1), "Sunday, %B %y, %d:%H"(1)
#> Using: "%A, %B %d, %y:%H"
#> [1] "Friday, October 20, 17:15"

What is your local setting ? Could it be the source of the problem ?


Regarding the second problem about forcing orders in stamp, here's a few lines of test to understand

# Getting my original locale back
Sys.setlocale("LC_TIME", ori)
#> [1] "French_France.1252"
# Do not work as stamp can't guess format
stamp("Sunday, Nov 30, 23:15", orders="AbdHM")(x)
#> Error in stamp("Sunday, Nov 30, 23:15", orders = "AbdHM"): Couldn't guess formats of: Sunday, Nov 30, 23:15
# Stamp guess format but select the wrong format `%Om` giving us back a month in number
stamp("Sunday, Nov 30, 23:15", orders="AbdHM", locale = "English")(x)
#> Multiple formats matched: "%A, %Om %d, %H:%M"(0), "%A, %b %d, %H:%M"(0)
#> Using: "%A, %Om %d, %H:%M"
#> [1] "vendredi, 10 20, 15:15"
# Changing my local gives us the desired result
Sys.setlocale("LC_TIME", "English")
#> [1] "English_United States.1252"
stamp("Sunday, Nov 30, 23:15", orders="AbdHM")(x)
#> Multiple formats matched: "%A, %b %d, %H:%M"(1), "%A, %Om %d, %H:%M"(0)
#> Using: "%A, %b %d, %H:%M"
#> [1] "Friday, Oct 20, 15:15"

This behaviour comes from the trained part of stamp function. Internal .trained_format function is not impacted by local argument in stamp. Formats are trained with the system LC_TIME local, it is why you get this behaviour.

Maybe we could change this behaviour as lubridate:::.trained_format have an unused local argument. Could be better if training was coherent with the date provided in stamp

@vspinu
Copy link
Member

vspinu commented May 22, 2017

This is a bug and it comes from the fact that fromat doesn't accept locale argument. We need to wrap it into system call to set the locale before the format.

@vspinu vspinu closed this as completed in 8504af8 May 22, 2017
@vspinu
Copy link
Member

vspinu commented May 22, 2017

Should be working now:

> Sys.setlocale("LC_TIME", "it_IT.utf8")
[1] "it_IT.utf8"
> formater <- stamp("Sunday, November 30, 23:15", locale = "en_DK.utf8")
Multiple formats matched: "%A, %B %d, %y:%H"(0), "%A, %B %y, %d:%H"(0), "Sunday, %B %d, %y:%H"(0), "Sunday, %B %y, %d:%H"(0)
Using: "%A, %B %d, %y:%H"
> x <- ymd_hm(c("2017-01-20 15:15", "2017-02-10 10:10"))
> formater(x)
[1] "Friday, January 20, 17:15"  "Friday, February 10, 17:10"

Thanks for investigating this.
Could you please check again with your data?

@vspinu
Copy link
Member

vspinu commented May 22, 2017

Note that only the locale issue is fixed. There is no easy way to fix the the misinterpretation of %H:%M. You need to supply your own orders which currently don't quite work as you reported:

> formater <- stamp("Sunday, November 30, 23:15", "ABdHM", locale = "en_DK.utf8")
Multiple formats matched: "%A, %Om %d, %H:%M"(0), "%A, %B %d, %H:%M"(0)
Using: "%A, %Om %d, %H:%M"
> x <- ymd_hm(c("2017-01-20 15:15", "2017-02-11 10:10"))
> formater(x)
[1] "Friday, 01 20, 15:15"   "Saturday, 02 11, 10:10"

@vspinu vspinu reopened this May 22, 2017
@ekstroem
Copy link
Contributor Author

Ah good (in the sense that it wasn't just me being baffled)!

With the updated version of lubridate I now get (my locale is C, and I'm running OSX).

stamp("Sunday, November 30, 23:15")(x)
Multiple formats matched: "%A, %B %d, %y:%H"(1), "%A, %B %y, %d:%H"(1), "Sunday, %B %d, %y:%H"(1), "Sunday, %B %y, %d:%H"(1)
Using: "%A, %B %d, %y:%H"
[1] "Monday, November 20, 17:15"

Date is okay, the time is due to year and hour just as @vspinu mentioned. If I run the shorter version of the months, %b, I get the following

stamp("Sunday, Nov 30, 23:15", orders="AbdHM")(x)
Multiple formats matched: "%A, %b %d, %H:%M"(1), "%A, %Om %d, %H:%M"(0)
Using: "%A, %b %d, %H:%M"
[1] "Monday, Nov 20, 15:15"

where all is good. Following my initial approach I get something that is almost right using B instead of b.

stamp("Sunday, Nov 30, 23:15", orders="ABdHM")(x)
Using: "%A, %Om %d, %H:%M"
[1] "Monday, 11 20, 15:15"

This is quite surprising (in a positive way) since I'm specifying an order that doesn't match the format and it still guesses the month although in numbers.

If I force the time zone and run

stamp("Sunday, November 1, 16:15", orders="ABdHM")(force_tz(x,  "Europe/Copenhagen"))
Multiple formats matched: "%A, %B %d, %H:%M"(1), "%A, %Om %d, %H:%M"(0)
Using: "%A, %B %d, %H:%M"
[1] "Monday, November 20, 15:15"

then I get the desired output format. Also works with

stamp("Sunday, November 30, 23:15", "ABdHM", locale = "en_DK.utf8")

as mentioned above.

Cheers!

@vspinu vspinu closed this as completed in e82460c May 22, 2017
@vspinu
Copy link
Member

vspinu commented May 22, 2017

This is quite surprising (in a positive way)

Well. This is a bug which is now fixed. It works now as expected with no need to reset the global locale:

> Sys.setlocale("LC_TIME", "it_IT.utf8")
[1] "it_IT.utf8"
> 
> formater <- stamp("Sunday, November 30, 23:15", "ABdHM", locale = "C")
Multiple formats matched: "%A, %B %d, %H:%M"(1), "%A, %Om %d, %H:%M"(0)
Using: "%A, %B %d, %H:%M"
> x <- ymd_hm(c("2017-01-20 15:15", "2017-02-11 10:10"))
> formater(x)
[1] "Friday, January 20, 15:15"    "Saturday, February 11, 10:10"

The locale issue wasn't stamp specific, so yes, your report revealed a bug in parsing as well. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants