Skip to content

Files

Latest commit

 

History

History

open_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README

Open Data Portal

  1. HQTA Areas: metadata feature server or map server
  2. HQTA Stops: metadata feature server or map server
  3. CA Transit Routes: metadata feature server or map server
  4. CA Transit Stops: metadata feature server or map server
  5. CA Average Transit Speeds by Stop-to-Stop Segments: metadata feature server or map server
  6. CA Average Transit Speeds by Route and Time of Day: metadata feature server or map server
  7. All GTFS datasets metadata/data dictionary

GTFS Schedule Routes & Stops Geospatial Data

Traffic Ops had a request for all transit routes and transit stops to be published in the open data portal.

  1. Update update_vars.py for current month
  2. In terminal: make create_gtfs_schedule_geospatial_open_data

stops_routes_mermaid

Metadata Automation Steps and References

  1. Add your dataset to catalog.yml and run gcs_to_esri.
    • In terminal: cd open_data followed by python gcs_to_esri.py
    • The log will show basics like column names and EPSG. Make sure the metadata reflects the same info!
    • Only use EPSG:4326 (WGS84). All open data portal datasets will be in WGS84.
    • Download the zipped shapefiles from the Hub to your local filesystem.
  2. If there are new datasets to add or changes to make, make them in metadata.yml and/or data_dictionary.yml.
    • If there are changes to make in metadata.yml, make them. Afterwards, in terminal, run: python supplement_meta.py
  3. If there are changes to be made to metadata.yml (adding new datasets, changing descriptions, change contact information, etc), make them. This is infrequent. An updated analysis date is already automated and does not have to be updated here.
  4. In terminal: python supplement_meta.py
  5. In terminal: python update_data_dict.py.
    • Check the log results, which tells you if there are columns missing from data_dictionary.yml. These columns and their descriptions need to be added. Every column in the ESRI layer must have a definition, and where there's an external data dictionary website to cite, provide a definition source.
  6. In terminal: python update_fields_fgdc.py. This populates fields with data_dictionary.yml values.
    • Only run if update_data_dict had changes to incorporate
  7. Run arcgis_pro_script to create XML files. Often it's easier to run via the notebook, but the script exists for better version control and to track feature changes.
    • Open a notebook in Hub and find the ARCGIS_PATH (your preferred local path for ArcGIS work)
    • Hardcode that path for arcpy.env.workspace = ARCGIS_PATH
    • Download metadata.json and place in your local path.
    • The exported XML metadata will be in file gdb directory.
    • Upload the XML metadata into Hub in open_data/xml/.
  8. If there are new datasets added, open update_vars.py and modify the script.
  9. In terminal: python metadata_update_pro.py.
    • Change into the open_data directory: cd open_data/.
    • The overwritten XML is stored in open_data/xml/run_in_esri/.
    • Download the overwritten XML files locally to run in ArcGIS.
  10. Run arcgis_pro_script after importing the updated XML metadata for each feature class.
    • There are steps to create FGDC templates for each datasets to store field information.
    • This only needs to be done once when a new dataset is created.
  11. In terminal: python cleanup.py to clean up old XML files and remove zipped shapefiles.
    • The YAML and XML files created/have changes get checked into GitHub.

Metadata

  • Metadata
  • Data dictionary
  • update_vars contains a lot of the variables that would frequently get updated in the publishing process.
    • Apply standardized column names across published datasets, even they differ from internal keys (org_id in favor of gtfs_dataset_key, agency in favor of organization_name).
    • Since we do not save multiple versions of published datasets, the columns are renamed prior to exporting the geoparquet as a zipped shapefile.

Open Data Intake Process