500 internal server error + bad JSON elements #1858

btskinner · 2021-07-13T16:53:26Z

I've been updating the rscorecard R package and have run into a couple of issues. Both involve the same call that has worked in the past. Here are the two things I'm seeing:

500 Internal Server error

When I make this call with rscorecard:

df <- sc_init() %>% 
    sc_filter(control == 1, region == 1:2, ccbasic == 1:24) %>% 
    sc_select(unitid, instnm, md_earn_wne_p10) %>% 
    sc_year(2009) %>%
    sc_get()

which translates to

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2009.earnings.10_yrs_after_entry.median&_page=0&_per_page=100&api_key=<HIDDEN>

I get a page with this message

This might be related to this error reported on the rscorecard GitHub repo.

Bad JSON elements

When I change the call to use data for 2010 instead of 2009, I get extra elements at the end of the pull. It's causing rscorecard to break, which is my issue, but since the code as worked in the past, something new is happening. Here's the API call (notice that I'm calling page=2, which returns the last 83 elements of the 283 element pull):

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2010.earnings.10_yrs_after_entry.median&_page=2&_per_page=100&api_key=<HIDDEN>

Here's the result (I've cut the result to the last 10 elements to save space and placed a ... to mark the cuts):

{
  "metadata": {
    "page": 2,
    "total": 283,
    "per_page": 100
  },
  "results": [
     ...
     {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania College of Technology",
      "id": 366252
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Suffolk County Community College",
      "id": 366395
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Carroll Community College",
      "id": 405872
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania Highlands Community College",
      "id": 414911
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Lancaster County Career and Technology Center",
      "id": 418533
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "York County Community College",
      "id": 420440
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Community College of Baltimore County",
      "id": 434672
    },
    {
      "UNITID": 475565,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 479956,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 480064,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    }
  ]
}

The last three elements have an extra key UNITID and then NULL values for the rest. This causes an error in my rscorecard pull. Again, that's my issue, but it isn't something that's been a problem in the past.

Next steps

These issues only recently started happening --- I'm guessing with the big changes to the API in April. Is this something that needs to be addressed on your end or on my end with better error handling? Either way, thanks for your work on this. I'm also happy to send more info.

The text was updated successfully, but these errors were encountered:

brownpl · 2021-07-23T16:27:20Z

Thank you for this detailed report. We had a data update last week and after that, I am not able to replicate the 500 error you report at the top. Do you want to give that a whirl and let me know if it remains an issue for you?

I have also added a fix for the non-standard objects being returned. This is currently going through our testing and should be available on production with our next data update, which is looking like it will be in the next week or two. If there are data delays, I'll go ahead and push up the fix independently for you. I'll add a note when it is live for you.

btskinner · 2021-07-23T18:33:51Z

I'm not sure about the 500 error either. Usually I assume that it is just me or my network, but another person reported a similar 500 error, so I passed it along. I'll give it some more testing when I have a chance.

As far as the non-standard objects, I added this bit of code to rscorecard,

for (i in names(df)) {
    if (is.null(dev_to_var(i))) { df[,i] <- NULL }
}

where dev_to_var() checks the developer-friendly names of the variables found in the call return (now column names in the data frame df) and drops any that aren't strictly in the data dictionary. If the name isn't found (like UNITID, which isn't a developer-friendly name in the API), then the data column is removed before the developer-friendly names are then converted to the more standard variable names and returned to the user.

So, FWIW, I think this bit of code will take care of the problem, or at least this one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

500 internal server error + bad JSON elements #1858

500 internal server error + bad JSON elements #1858

btskinner commented Jul 13, 2021

brownpl commented Jul 23, 2021

btskinner commented Jul 23, 2021

500 internal server error + bad JSON elements #1858

500 internal server error + bad JSON elements #1858

Comments

btskinner commented Jul 13, 2021

500 Internal Server error

Bad JSON elements

Next steps

brownpl commented Jul 23, 2021

btskinner commented Jul 23, 2021