Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update name version and front page #126

Merged
merged 1 commit into from
Aug 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@

# -- Project information -----------------------------------------------------

project = 'Hydro Tools'
copyright = '2020, Jason Regina and Austin Raney'
author = 'Jason Regina and Austin Raney'
project = 'OWPHydroTools'
copyright = '2021, Jason A. Regina and Austin Raney'
author = 'Jason A. Regina and Austin Raney'

# The full version, including alpha/beta/rc tags
release = '2.0.0-alpha.0'
release = '2.1.2'


# -- General configuration ---------------------------------------------------
Expand Down
243 changes: 192 additions & 51 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
``HydroTools``
``OWPHydroTools``
=====================

Tools for retrieving hydrological data
Expand All @@ -15,19 +15,20 @@ Tools for retrieving hydrological data
hydrotools.events.event_detection
hydrotools.caches


Motivation
----------

We developed hydrotools with data scientists in mind. We attempted
to ensure the simplest methods such as ``get`` both accepted and
returned data structures frequently used by data scientists using
scientific Python. Specifically, this means that
|pandas.DataFrames|_,
|geopandas.GeoDataFrames|_,
We developed OWPHydroTools with data scientists in mind. We attempted to
ensure the simplest methods such as ``get`` both accepted and returned
data structures frequently used by data scientists using scientific
Python. Specifically, this means that
```pandas.DataFrames`` <https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe>`__,
```geopandas.GeoDataFrames`` <https://geopandas.readthedocs.io/en/latest/docs/user_guide/data_structures.html#geodataframe>`__,
and
|numpy.arrays|_
```numpy.arrays`` <https://numpy.org/doc/stable/reference/arrays.html#array-objects>`__
are the most frequently encountered data structures when using
hydrotools. The majority of methods include sensible defaults that
OWPHydroTools. The majority of methods include sensible defaults that
cover the majority of use-cases, but allow customization if required.

We also attempted to adhere to organizational (NOAA-OWP) data standards
Expand All @@ -38,67 +39,207 @@ conventions. Our intent is to make retrieving, evaluating, and exporting
data as easy and reproducible as possible for scientists, practitioners
and other hydrological experts.

.. |numpy.arrays| replace:: ``numpy.arrays``
.. _numpy.arrays: https://numpy.org/doc/stable/reference/arrays.html#array-objects
.. |geopandas.GeoDataFrames| replace:: ``geopandas.GeoDataFrames``
.. _geopandas.GeoDataFrames: https://geopandas.readthedocs.io/en/latest/docs/user_guide/data_structures.html#geodataframe
.. |pandas.DataFrames| replace:: ``pandas.DataFrames``
.. _pandas.DataFrames: https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe

What’s here?
What's here?
------------

Weve taken a grab-and-go approach to installation and usage of
Hydrotools. This means, in line with a standard toolbox, you will
We've taken a grab-and-go approach to installation and usage of
OWPHydroTools. This means, in line with a standard toolbox, you will
typically install just the tool or tools that get your job done without
having to install all the other tools available. This means a lighter
installation load and that tools can be added to the toolbox, without
affecting your workflows!

It should be noted, we commonly refer to individual tools in HydroTools as a
subpackage or by their name (e.g. ``nwis_client``). You will find this lingo in both
issues and documentation.
It should be noted, we commonly refer to individual tools in
OWPHydroTools as a subpackage or by their name (e.g. ``nwis_client``).
You will find this lingo in both issues and documentation.

Currently the repository has the following subpackages:

- nwis_client: Provides easy to use methods for retrieving data from
the `USGS NWIS Instantaneous Values (IV) Web
Service <https://waterservices.usgs.gov/rest/IV-Service.html>`__.
- \_restclient: A generic REST client with built in cache that make the
construction and retrieval of GET requests painless.
- metrics: Variety of methods used to compute common evaluation metrics.
- ``events``: Variety of methods used to perform event-based
evaluations of hydrometric time series
- ``nwm_client``: Provides methods for retrieving National Water Model
data from various sources including `Google Cloud
Platform <https://console.cloud.google.com/marketplace/details/noaa-public/national-water-model>`__
and
`NOMADS <https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/>`__
- ``metrics``: Variety of methods used to compute common evaluation
metrics
- ``nwis_client``: Provides easy to use methods for retrieving data
from the `USGS NWIS Instantaneous Values (IV) Web
Service <https://waterservices.usgs.gov/rest/IV-Service.html>`__
- ``_restclient``: A generic REST client with built in cache that make
the construction and retrieval of GET requests painless
- ``caches``: Provides a variety of object caching utilities

UTC Time
--------

Note: the canonical ``pandas.DataFrames`` used by OWPHydroTools use
time-zone naive datetimes that assume UTC time. In general, do not
assume methods are compatible with time-zone aware datetimes or
timestamps. Expect methods to transform time-zone aware datetimes and
timestamps into their timezone naive counterparts at UTC time.

Usage
-----

Refer to each subpackage's ``README.md`` or documentation for examples
of how to use each tool.

Installation
------------

In accordance with the python community, we support and advise the usage
of virtual environments in any workflow using python. In the following
installation guide, we use pythons built-in ``venv`` module to create a
installation guide, we use python's built-in ``venv`` module to create a
virtual environment in which the tools will be installed. Note this is
just personal preference, any python virtual environment manager should
work just fine (``conda``, ``pipenv``, etc. ).

.. code:: bash

# Create and activate python environment, requires python >= 3.8
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install --upgrade pip

# Install nwis_client
$ python3 -m pip install git+https://github.com/NOAA-OWP/hydrotools.git#subdirectory=python/nwis_client

# Install _restclient
$ python3 -m pip install git+https://github.com/NOAA-OWP/hydrotools.git#subdirectory=python/_restclient

# Install metrics
$ python3 -m pip install git+https://github.com/NOAA-OWP/hydrotools.git#subdirectory=python/metrics

UTC Time
--------

Note: the canonical ``pandas.DataFrames`` used by HydroTools use
time-zone naive datetimes that assume UTC time. In general, do not
assume methods are compatible with time-zone aware datetimes or
timestamps. Expect methods to transform time-zone aware datetimes and
timestamps into their timezone naive counterparts at UTC time.
# Create and activate python environment, requires python >= 3.8
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install --upgrade pip

# Install all tools
$ python3 -m pip install hydrotools

# Alternatively you can install a single tool
# This installs the NWIS Client tool
$ python3 -m pip install hydrotools.nwis_client

OWPHydroTools Canonical Format
------------------------------

"Canonical" labels are protected and part of a fixed lexicon. Canonical
labels are shared among all ``hydrotools`` subpackages. Subpackage
methods should avoid changing or redefining these columns where they
appear to encourage cross-compatibility. Existing canonical labels are
listed below:

- ``value`` [*float32*\ ]: Indicates the real value of an individual
measurement or simulated quantity.
- ``value_time`` [*datetime64[ns]*\ ]: formerly ``value_date``, this
indicates the valid time of ``value``.
- ``variable_name`` [*category*\ ]: string category that indicates the
real-world type of ``value`` (e.g. streamflow, gage height,
temperature).
- ``measurement_unit`` [*category*\ ]: string category indicating the
measurement unit (SI or standard) of ``value``
- ``qualifiers`` [*category*\ ]: string category that indicates any
special qualifying codes or messages that apply to ``value``
- ``series`` [*integer32*\ ]: Use to disambiguate multiple coincident
time series returned by a data source.
- ``configuration`` [*category*\ ]: string category used as a label for
a particular time series, often used to distinguish types of model
runs (e.g. short\_range, medium\_range, assimilation)
- ``reference_time`` [*datetime64[ns]*\ ]: formerly, ``start_date``,
some reference time for a particular model simulation. Could be
considered an issue time, start time, end time, or other meaningful
reference time. Interpretation is simulation or forecast specific.
- ``longitude`` [*category*\ ]: float32 category, WGS84 decimal
longitude
- ``latitude`` [*category*\ ]: float32 category, WGS84 decimal latitude
- ``crs`` [*category*\ ]: string category, Coordinate Reference System,
typically ``"EPSG:4326"``
- ``geometry`` [*geometry*\ ]: ``GeoPandas`` compatible ``GeoSeries``
used as the default "geometry" column

Non-Canonical Column Labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Non-Canonical" labels are subpackage specific extensions to the
canonical standard. Packages may share these non-canonical lables, but
cross-compatibility is not guaranteed. Examples of non-canonical labels
are given below.

- ``usgs_site_code`` [*category*\ ]: string category indicating the
USGS Site Code/gage ID
- ``nwm_feature_id`` [*integer32*\ ]: indicates the NWM reach feature
ID/ComID
- ``nws_lid`` [*category*\ ]: string category indicating the NWS
Location ID/gage ID
- ``usace_gage_id`` [*category*\ ]: string category indicating the
USACE gage ID
- ``start`` [*datetime64[ns]*\ ]: datetime returned by
``event_detection`` that indicates the beginning of an event
- ``end`` [*datetime64[ns]*\ ]: datetime returned by
``event_detection`` that indicates the end of an event

Categorical Data Types
~~~~~~~~~~~~~~~~~~~~~~

OWPHydroTools uses ``pandas.Dataframe`` that contain
``pandas.Categorical`` values to increase memory efficiency. Depending
upon your use-case, these values may require special consideration. To
see if a ``Dataframe`` returned by a OWPHydroTools subpackage contains
``pandas.Categorical`` you can use ``pandas.Dataframe.info`` like so:

.. code:: python

print(my_dataframe.info())

.. code:: console

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5706954 entries, 0 to 5706953
Data columns (total 7 columns):
# Column Dtype
--- ------ -----
0 value_date datetime64[ns]
1 variable_name category
2 usgs_site_code category
3 measurement_unit category
4 value float32
5 qualifiers category
6 series category
dtypes: category(5), datetime64[ns](1), float32(1)
memory usage: 141.5 MB
None

Columns with ``Dtype`` ``category`` are ``pandas.Categorical``. In most
cases, the behavior of these columns is indistinguishable from their
primitive types (in this case ``str``) However, there are times when use
of categories can lead to unexpected behavior such as when using
``pandas.DataFrame.groupby`` as documented
`here <https://stackoverflow.com/questions/48471648/pandas-groupby-with-categories-with-redundant-nan>`__.
``pandas.Categorical`` are also incompatible with ``fixed`` format HDF
files (must use ``format="table"``) and may cause unexpected behavior
when attempting to write to GeoSpatial formats using ``geopandas``.

Possible solutions include:

Cast ``Categorical`` to ``str``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Casting to ``str`` will resolve all of the aformentioned issues
including writing to geospatial formats.

.. code:: python

my_dataframe['usgs_site_code'] = my_dataframe['usgs_site_code'].apply(str)

Remove unused categories
^^^^^^^^^^^^^^^^^^^^^^^^

This will remove categories from the ``Series`` for which no values are
actually present.

.. code:: python

my_dataframe['usgs_site_code'] = my_dataframe['usgs_site_code'].cat.remove_unused_categories()

Use ``observed`` option with ``groupby``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This limits ``groupby`` operations to category values that actually
appear in the ``Series`` or ``DataFrame``.

.. code:: python

mean_flow = my_dataframe.groupby('usgs_site_code', observed=True).mean()

.. |Unit Testing Status| image:: https://github.com/noaa-owp/hydrotools/actions/workflows/run_unit_tests.yml/badge.svg
.. |OWPHydroTools| image:: https://raw.githubusercontent.com/NOAA-OWP/hydrotools/main/docs/banner.png