Skip to content

Software and Tools

Billy Charlton edited this page Apr 24, 2017 · 9 revisions

Software plumbing / building blocks

CKAN - The best open-source data warehouse out there. A "data warehouse" is a portal, a front-end to your organization's data. The data itself can sit on a database server, or CKAN can also serve up Excel files or anything else you want to upload to it. We're going with a database backend for the major datasets because we need flexibility and the datasets (especially from Fast-trips) are pretty big.

OpenGeo Suite - A full geospatial database implementation. OpenGeo is merely a "wrapper" around the best-of-breed open source components; it's fully supported by BoundlessGeo -- as in, you can hire them and get support contracts. OpenGeo Suite is free unless you go for the "enterprise" option, which is overkill for our project. OpenGeo is a wrapper around several products that are designed to work together:

  • PostgreSQL - Pronounced "post-gress". The most advanced open-source database. It is really the only choice if you want a fully geospatial open-source database. PostgreSQL is quite comparable in power to MS SQL Server and Oracle. Most of the internet runs on PostgreSQL.
  • PostGIS - Pronounced "post-djiss". The Geospatial add-on extension for the PostgreSQL database. PostGIS enables a huge variety of server-backed geospatial database queries and joins, e.g. which parcels are within X miles of a station; how many miles of road are in San Francisco; etc.
  • GeoServer - Serves up PostGIS shapes for importing into web maps. GeoServer produces the geographic "layers" that end up on your map.
  • GeoWebCache - GeoWebCache just speeds things up by serving up pre-rendered map tiles if they've already been created.

QGis - Desktop GIS software app, similar to ESRI ArcGIS. You do not need to switch to QGis to use the OpenGeo suite of tools, but they do work well together.

Docker - Containerization/virtualization platform. We will only be using Docker during system development to compartmentalize the various systems from each other. It is not currently "best practice" to use Docker in actual production since the platform is new. But it makes development easier. Production will be based on virtual servers or real server hardware.

Visualization toolkits

There are a TON of data visualization toolkits out there!! Here are the ones I'm familiar with.

Tool Description
Leaflet Leaflet - Lightweight web-mapping toolkit. Leaflet can be used to embed an interactive map on any web page. It can pull base maps from many different sources, and you can layer data on top of the base map from hard-coded JSON, from PostGIS/GeoServer queries, etc.
Seaborn Seaborn - Static data visualization library for Python. Easily create beautiful scatterplots, chord diagrams, and other great 2D static plots. You'll get a PNG image file, not an interactive experience. Check out the Seaborn gallery!
Bokeh Bokeh - Interactive data visualization library for Python. A little more work but you get a fully interactive data graphic which can be embedded in web pages. Check out the Bokeh gallery!
DataShader Part of Bokeh. Helps map very large datasets, when Leaflet gets bogged down.
Shiny Shiny - R-based web visualization toolkit. You can build entire data "dashboard" websites using a Shiny server.
D3 D3 is the awesomest javascript data visualization library. Interactive, embeddable, impossibly elegant... and complex. I don't know D3 well but it is extremely well-regarded.
Clone this wiki locally