Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data export functionality to project list #44

Closed
4 tasks
TylerMatteo opened this issue Jul 9, 2024 · 15 comments · Fixed by #102
Closed
4 tasks

Add data export functionality to project list #44

TylerMatteo opened this issue Jul 9, 2024 · 15 comments · Fixed by #102
Assignees

Comments

@TylerMatteo
Copy link
Contributor

TylerMatteo commented Jul 9, 2024

Acceptance Criteria:

  • Add data export button and modal to project list panels for CDs and CCDs as per designs (desktop and mobile).
  • Export buttons link to pre-prepared data downloads provided by DE
  • Note that designs include radio buttons and checkboxes for "all districts". When user selects this and clicks "Export data" they should get the entire CPDB dataset
  • Designs include checkboxes for CSV or SHP. Because we only have CSVs for now, those checkboxes should be omitted.

Blocked by

@TylerMatteo
Copy link
Contributor Author

@damonmcc I believe last time we talked about data export, we were still considering using pre-filtered datasets of projects exported by DE. Are those already being published somewhere we can link to?

@damonmcc
Copy link
Member

@TylerMatteo do you think links to files in a DO bucket would work? in my experience, they'd work

I'll double-check that the csv files aren't zipped and that they open easily in excel (there was an issue with long wkt values in the geometry column)

@damonmcc damonmcc self-assigned this Jul 10, 2024
@damonmcc
Copy link
Member

damonmcc commented Jul 11, 2024

@TylerMatteo a few notes:

  • at the moment we upload a zip called projects_in_geographies.zip which has all csvs (DO link to our nightly build)
  • we could upload an unzipped folder in order to have a link for each csv file
  • the district numbers in filenames aren't zero-padded but could be if that'd be helpful

screenshot of current folder structure

Screenshot 2024-07-10 at 5 20 03 PM

@TylerMatteo
Copy link
Contributor Author

@damonmcc Yeah we'll need the links for each file. Links to DO spaces should work, as long they're publicly available. Looking at our DB, it looks like we do 0 pad community district IDs but do not pad council districts (maybe @TangoYankee can correct me on that if I'm wrong). What about data dictionaries? Is the schema of these files the same as one of the files in main CPDB product?

This will probably be sprint P for us, at the earliest, so no huge rush. Just trying to groom for when we can pull it in.

@TangoYankee
Copy link
Collaborator

0 pad community district IDs but do not pad council districts

Yes, this is derived from the source data. The source community district file id column is "BoroCD". It is always three characters, with the first character as the borough id and the other two characters as the community district id. In cases where the community district id would be "1", it gets zero padded to "01". ie) Manhattan 1 is stored as "101". In contrast, City Council District id is stored in "countdist". These ids are written as if they are numbers. So, city council district "1" is written without the padding "0".

I prefer the style with the padding zero because it gives the id a fixed length- making it easier to enforce specific structures. However, for this first iteration, I defaulted to taking the data "as-is". So as a rule, data formats in our api will mirror the data formats in the source data (with just enough exceptions to bite you if you rely on the assumption without double checking)

@damonmcc
Copy link
Member

damonmcc commented Aug 29, 2024

@TangoYankee

there are now csv files in edm-publishing/publish/db-cpdb/24prelim/projects_in_geographies/

City Council District filenames have zero-padding, Community District filenames do not have zero-padding

@damonmcc
Copy link
Member

damonmcc commented Aug 29, 2024

DE had some ideas for longer-term approaches:

  • generating signed links that expire (in minutes?)
  • linking to csvs in folders that are more towards the end of our pipeline, we're calling them "package" folders and CPDB's are in product-datasets/cpdb/

but no problem using the current "publish" folders for now

@TangoYankee
Copy link
Collaborator

DE had some ideas for longer-term approaches:

* generating signed links that expire (in minutes?)

* linking to csvs in folders that are more towards the end of our pipeline, we're calling them "package" folders and CPDB's are in `product-datasets/cpdb/`

but no problem using the current "publish" folders for now

Oh, access control is always fun. For the signed links, we would probably want the Capital Planning Explorer to also have rate-limiting/access controls. That raises its own questions around how much friction we can add before we're creating an undue burden for the public to access the data.

Are we going to have folks create accounts or get API keys through capital planning explorer? Do we track IP addresses (still possible to subvert using a DDOS)? Do we use browser cookies? Do forgo individual accounts and have everyone share one rate limit by putting the rate limit at the Application level? ie) Capital Planning explorer can only make so many requests- regardless of who makes each one. This approach would mean one super-user could box-out everyone else.

I don't have any answers now. But the first step is to ask questions

@damonmcc
Copy link
Member

damonmcc commented Sep 3, 2024

Oh, access control is always fun. For the signed links, we would probably want the Capital Planning Explorer to also have rate-limiting/access controls. That raises its own questions around how much friction we can add before we're creating an undue burden for the public to access the data.

Are we going to have folks create accounts or get API keys through capital planning explorer? Do we track IP addresses (still possible to subvert using a DDOS)? Do we use browser cookies? Do forgo individual accounts and have everyone share one rate limit by putting the rate limit at the Application level? ie) Capital Planning explorer can only make so many requests- regardless of who makes each one. This approach would mean one super-user could box-out everyone else.

I don't have any answers now. But the first step is to ask questions

maybe the file server that hosts files for our Bytes pages would be a better thing to link to than a Digital Ocean S3 bucket then? I think all those links are something like s-media.nyc.gov/agencies/dcp/assets/files/...

I don't know what rate limiting the Microsoft Services has in place for those links but (as far as I know) they've been working well for public access

@TangoYankee
Copy link
Collaborator

@TangoYankee

there are now csv files in edm-publishing/publish/db-cpdb/24prelim/projects_in_geographies/

City Council District filenames have zero-padding, Community District filenames do not have zero-padding

Ohhhh, oh no. I just REALLY read this. AE does it the other way- City Council does not have leading zeros. Community district does.

@damonmcc
Copy link
Member

damonmcc commented Sep 5, 2024

Ohhhh, oh no. I just REALLY read this. AE does it the other way- City Council does not have leading zeros. Community district does.

lol my bad. on it!

@damonmcc
Copy link
Member

damonmcc commented Sep 5, 2024

@TangoYankee fixed!

@TangoYankee
Copy link
Collaborator

@damonmcc the district Ids are looking good. Could I make a couple requests for the folder structure?

  • First request would be to remove the "City Council District" and "Community District" folders, placing all the files as direct children of projects_in_geographies. This simplifies things on the application side because we would no longer need to fumble with an additional path parameter. It also eliminates the space characters that we would need to escape when writing the District%20Folder%20Names
  • Second request would be to have the zipped project_in_geographies folder (from which the district csvs are extracted) also live in the project_in_geographies folder. So, it would be project_in_geographies/project_in_geographies.zip. This feels silly. But, it's in support of the application feature which allows folks to download all districts. Keeping the zipped folder in the same directory as the single district files helps us simplify the paths we're looking for. It also helps ensure we're pulling the same version of the data, whether we're serving a single district or a zip of all of them.

cc: @TylerMatteo

@TangoYankee
Copy link
Collaborator

@TylerMatteo

I have two changes that I would like to make for the CP-Map interface. These changes apply to the "All Districts" option.

  1. I think the default for "all districts" should be "no" rather than "yes". The user got to this modal through a specific district. The export should default to only the selected district. I think making all districts default to "yes" makes users opt-out of data they didn't expect.

  2. It's pragmatic to replace the Radio components with a Switch components. Radio components are not yet part of Streetscape but Switches are. I did try to add Radio buttons to Streetscape (Structure radio component export ae-streetscape#78) but hit some friction trying to get the styling to work. Because the choice to include all districts is binary, a Switch should be adequate without requiring additional work on Streetscape.

@damonmcc
Copy link
Member

damonmcc commented Sep 6, 2024

@damonmcc the district Ids are looking good. Could I make a couple requests for the folder structure?

@TangoYankee done!

TangoYankee added a commit that referenced this issue Sep 9, 2024
Add Export data button to each district project list
Add modal that links to download for selected or all district data

closes #44
TangoYankee added a commit that referenced this issue Sep 9, 2024
Add Export data button to each district project list
Add modal that links to download for selected or all district data

closes #44
TangoYankee added a commit that referenced this issue Sep 10, 2024
Add Export data button to each district project list
Add modal that links to download for selected or all district data

closes #44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants