Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to count data sets #101

Open
tomschenkjr opened this issue Sep 27, 2016 · 0 comments
Open

Add function to count data sets #101

tomschenkjr opened this issue Sep 27, 2016 · 0 comments

Comments

@tomschenkjr
Copy link
Contributor

Counting the number of data sets on a Socrata data portal is more nuanced that it seems at first. Socrata portals include filtered views and other subsets that aren't really distinct "data sets".

@levyj created a method of counting the number of data sets that we use in the City of Chicago:

  1. Use https://data.cityofchicago.org/resource/7eck-a4hy.json?$select=type,count(type)&$group=type to download the catalog, aggregated to view type.
  2. Sum the Tabular and Blob counts.
  3. Determine the number of Mondara datasets with https://data.cityofchicago.org/browse?tags=map_layer​ and add that count to the Tabular + Blob total.
  4. This final total amounts to Tablular datasets + Maps in KML/Shapefile format + Maps in Mondara format.

It would be great to implement a function which includes all of these steps. It will help ensure consistency of counting data sets and could set a de facto standard on how these are counted. It will also save time since the current method includes a manual process in steps 3 and 4.

RSocrata contains the necessary functions to accomplish this task. Step 1 is using a particular read.socrata() call and step 3 can leverage ls.socrata() call. The filtering and summation can be done using base R functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant