Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_stats19 #239

Closed
weijia2013 opened this issue Jun 21, 2024 · 3 comments · Fixed by #248
Closed

get_stats19 #239

weijia2013 opened this issue Jun 21, 2024 · 3 comments · Fixed by #248
Assignees

Comments

@weijia2013
Copy link

When I am using:

get_stats19(year = 2005, type = "accidents", data_dir = "XXX"), directory has been replaced. I got error

No files of that type found for that year.
No files found. Check the stats19 website on data.gov.uk
Files identified:
Error in if (data_already_exists) { : argument is of length zero

But when I am using:
get_stats19(year = 2005 - 2024, type = "accidents", data_dir = "XXX")

I can download the data from 1979 to 2023, which includes 2005 data.

Why would this be happening?

@Robinlovelace
Copy link
Member

Not sure but agree it could be clearer. I will look to fix this. The main issue is that data only exists per year for last 5 years, before that we should default to the huge 1979-2023 dataset. Thanks for reporting and any further feedback or ideas for fix let me know, does the plan outlined above sound good to you (such that if you set year = 2005 you will get the data from 1979)?

@weijia2013
Copy link
Author

It sounds a good plan. Thanks for the updating.

@Robinlovelace Robinlovelace self-assigned this Jun 21, 2024
@BlaiseKelly
Copy link

BlaiseKelly commented Jun 23, 2024

I think this error is from the find_file_name function in the utils module. It is searching for a file mentioning the specific year, which as Robin says doesn't exist for anything before 2018.

find_file_name = function(years = NULL, type = NULL) {
  result = unlist(stats19::file_names, use.names = FALSE)
  if(!is.null(years)) {
    if(min(years) >= 2016) {
      result = result[!grepl(pattern = "1979", x = result)]
    }
    result = result[!grepl(pattern = "adjust", x = result)]
    result = result[grepl(pattern = years, x = result)]
    }

  # see https://github.com/ITSLeeds/stats19/issues/21
  if(!is.null(type)) {
    type = gsub(pattern = "cas", replacement = "ics-cas", x = type)
    result_type = result[grep(pattern = type, result, ignore.case = TRUE)]
    if(length(result_type) > 0) {
      result = result_type
    } else {
      if(is.null(years)) {
       stop("No files of that type found", call. = FALSE)
      } else {
        message("No files of that type found for that year.")
      }
    }
  }

  if(length(result) < 1) {
    message("No files found. Check the stats19 website on data.gov.uk")
  }
  unique(result)
}

I changed it to this. The only change is the first part, but including the full function so can cut and paste.

find_file_name = function(years = NULL, type = NULL) {
  result = unlist(stats19::file_names, use.names = FALSE)
  if(!is.null(years)) {
    if(min(years) >= 2018) {
      result = result[grepl(pattern = years, x = result)]
    }
    if(min(years) <= 2017) {
    result = result[!grepl(pattern = "adjust", x = result)]
    result = result[grepl(pattern = "1979", x = result)]
    }

  # see https://github.com/ITSLeeds/stats19/issues/21
  if(!is.null(type)) {
    type = gsub(pattern = "cas", replacement = "ics-cas", x = type)
    result_type = result[grep(pattern = type, result, ignore.case = TRUE)]
    if(length(result_type) > 0) {
      result = result_type
    } else {
      if(is.null(years)) {
       stop("No files of that type found", call. = FALSE)
      } else {
        message("No files of that type found for that year.")
      }
    }
  }

  if(length(result) < 1) {
    message("No files found. Check the stats19 website on data.gov.uk")
  }
  unique(result)
  }
}

This was included in the pull request from last week, which I am using on my machine, but is failing the automated checks, if anyone has a chance to have a look.

I also wondered since so much effort has gone into utilising the local store of the downloaded data it might be helpful to add an extra step to split the 1979-2023 dataset into years. So each file is saved as type_year.RDS and other functions work off that. Would add a little bit of time to the import step, but speed up analysis of multiple years?

@Robinlovelace Robinlovelace linked a pull request Jul 31, 2024 that will close this issue
Robinlovelace added a commit that referenced this issue Jul 31, 2024
Robinlovelace added a commit that referenced this issue Jul 31, 2024
Set year to 1979 if earlier than 2018, close #239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants