Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading Date for French #37

Closed
boussouf opened this issue Feb 26, 2020 · 2 comments
Closed

reading Date for French #37

boussouf opened this issue Feb 26, 2020 · 2 comments
Assignees

Comments

@boussouf
Copy link

Hello,

pubDate in French format are sometimes not read by the tidyRSS function.
unfortunatly, the return column is NA, so we lose this information.

Example: View(tidyfeed("https://www.valdemarne.fr/rss.xml"))

Thank you

@RobertMyles
Copy link
Owner

Thanks for the bug report, @boussouf . That's an interesting problem, see here . I've been using anytime() to parse dates, whereas before v2 I had used lubridate. I might go back to lubridate, although this would have failed under previous versions of tidyRSS too.

library(xml2)
library(httr)
library(magrittr)
library(anytime)

rss <- "https://www.valdemarne.fr/rss.xml"

GET(rss) %>% 
  read_xml() %>% 
  xml_find_all("channel") %>% 
  xml_find_all("item") %>% 
  xml_find_all("pubDate") %>% 
  xml_text()
#>  [1] "Vendredi, 13 Mars, 2020 - 12:22"    "Lundi, 24 Février, 2020 - 15:42"   
#>  [3] "Vendredi, 21 Février, 2020 - 14:17" "Vendredi, 21 Février, 2020 - 11:45"
#>  [5] "Mardi, 18 Février, 2020 - 11:00"    "Vendredi, 14 Février, 2020 - 13:22"
#>  [7] "Mardi, 11 Février, 2020 - 16:24"    "Mardi, 11 Février, 2020 - 16:01"   
#>  [9] "Mardi, 11 Février, 2020 - 14:20"    "Mardi, 11 Février, 2020 - 11:40"


GET(rss) %>% 
  read_xml() %>% 
  xml_find_all("channel") %>% 
  xml_find_all("item") %>% 
  xml_find_all("pubDate") %>% 
  xml_text() %>% 
  anytime()
#>  [1] NA NA NA NA NA NA NA NA NA NA

GET(rss) %>% 
  read_xml() %>% 
  xml_find_all("channel") %>% 
  xml_find_all("item") %>% 
  xml_find_all("pubDate") %>% 
  xml_text() %>% 
  lubridate::parse_date_time("dmy hm", locale = "fr_FR.UTF-8")
#> Warning: hms, hm and ms usage is deprecated, please use HMS, HM or MS instead.
#> Deprecated in version '1.5.6'.
#>  [1] "2020-03-13 12:22:00 UTC" "2020-02-24 15:42:00 UTC"
#>  [3] "2020-02-21 14:17:00 UTC" "2020-02-21 11:45:00 UTC"
#>  [5] "2020-02-18 11:00:00 UTC" "2020-02-14 13:22:00 UTC"
#>  [7] "2020-02-11 16:24:00 UTC" "2020-02-11 16:01:00 UTC"
#>  [9] "2020-02-11 14:20:00 UTC" "2020-02-11 11:40:00 UTC"

Created on 2020-02-26 by the reprex package (v0.3.0)

For the moment, I don't have a quick fix for this, though I'll have something in version 2.0.1. That will be here on GH in the next few weeks but will take a while to get to CRAN as I don't want to spam them with releases.

@RobertMyles RobertMyles self-assigned this Feb 26, 2020
@RobertMyles
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants