Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 internal server error + bad JSON elements #1858

Open
btskinner opened this issue Jul 13, 2021 · 2 comments
Open

500 internal server error + bad JSON elements #1858

btskinner opened this issue Jul 13, 2021 · 2 comments

Comments

@btskinner
Copy link

I've been updating the rscorecard R package and have run into a couple of issues. Both involve the same call that has worked in the past. Here are the two things I'm seeing:

500 Internal Server error

When I make this call with rscorecard:

df <- sc_init() %>% 
    sc_filter(control == 1, region == 1:2, ccbasic == 1:24) %>% 
    sc_select(unitid, instnm, md_earn_wne_p10) %>% 
    sc_year(2009) %>%
    sc_get()

which translates to

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2009.earnings.10_yrs_after_entry.median&_page=0&_per_page=100&api_key=<HIDDEN>

I get a page with this message

Screen Shot 2021-07-13 at 12 37 16 PM

This might be related to this error reported on the rscorecard GitHub repo.

Bad JSON elements

When I change the call to use data for 2010 instead of 2009, I get extra elements at the end of the pull. It's causing rscorecard to break, which is my issue, but since the code as worked in the past, something new is happening. Here's the API call (notice that I'm calling page=2, which returns the last 83 elements of the 283 element pull):

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2010.earnings.10_yrs_after_entry.median&_page=2&_per_page=100&api_key=<HIDDEN>

Here's the result (I've cut the result to the last 10 elements to save space and placed a ... to mark the cuts):

{
  "metadata": {
    "page": 2,
    "total": 283,
    "per_page": 100
  },
  "results": [
     ...
     {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania College of Technology",
      "id": 366252
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Suffolk County Community College",
      "id": 366395
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Carroll Community College",
      "id": 405872
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania Highlands Community College",
      "id": 414911
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Lancaster County Career and Technology Center",
      "id": 418533
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "York County Community College",
      "id": 420440
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Community College of Baltimore County",
      "id": 434672
    },
    {
      "UNITID": 475565,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 479956,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 480064,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    }
  ]
}

The last three elements have an extra key UNITID and then NULL values for the rest. This causes an error in my rscorecard pull. Again, that's my issue, but it isn't something that's been a problem in the past.

Next steps

These issues only recently started happening --- I'm guessing with the big changes to the API in April. Is this something that needs to be addressed on your end or on my end with better error handling? Either way, thanks for your work on this. I'm also happy to send more info.

@brownpl
Copy link
Contributor

brownpl commented Jul 23, 2021

Thank you for this detailed report. We had a data update last week and after that, I am not able to replicate the 500 error you report at the top. Do you want to give that a whirl and let me know if it remains an issue for you?

I have also added a fix for the non-standard objects being returned. This is currently going through our testing and should be available on production with our next data update, which is looking like it will be in the next week or two. If there are data delays, I'll go ahead and push up the fix independently for you. I'll add a note when it is live for you.

@btskinner
Copy link
Author

I'm not sure about the 500 error either. Usually I assume that it is just me or my network, but another person reported a similar 500 error, so I passed it along. I'll give it some more testing when I have a chance.

As far as the non-standard objects, I added this bit of code to rscorecard,

for (i in names(df)) {
    if (is.null(dev_to_var(i))) { df[,i] <- NULL }
}

where dev_to_var() checks the developer-friendly names of the variables found in the call return (now column names in the data frame df) and drops any that aren't strictly in the data dictionary. If the name isn't found (like UNITID, which isn't a developer-friendly name in the API), then the data column is removed before the developer-friendly names are then converted to the more standard variable names and returned to the user.

So, FWIW, I think this bit of code will take care of the problem, or at least this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants