-
Notifications
You must be signed in to change notification settings - Fork 2
API Data Science Cheatsheet
This page is a brief reference guide targeted for data science consumers of the SolarNetwork API. That is to say, people who are mostly interested in downloading large sets of data from SolarNetwork.
SolarNetwork is composed of two main systems:
- SolarNode — the IoT "edge" computer collecting data from physical devices; data is organized into datum streams (see next section); often referred to as simply "node"
- SolarNet — the cloud-based data repository that nodes post data to, and where APIs are available for querying the data
At its core SolarNetwork uses a general and flexible time-based data model. The smallest unit of information is known as a datum, which is composed of a timestamp (when the data was collected), a unique stream ID, and any number of key/value pairs known as properties. A property could be any measured value, like watts: 1234.
A datum stream is just a collection of datum with the same stream ID over time. You could visualize a datum stream with a stream ID of abc-1234-def-456
like this:
+------------------+ +------------------+ +------------------+
| abc-1234-def-456 | | abc-1234-def-456 | | abc-1234-def-456 |
| 2024-11-05 15:01 | | 2024-11-05 15:02 | | 2024-11-05 15:03 |
+------------------| -- +------------------| -- +------------------| ...
| watts: 1234 | | watts: 987 | | watts: 2345 |
| frequency: 49.9 | | frequency: 50.0 | | frequency: 50.1 |
| status: ok | | status: ok | | status: ok |
+------------------+ +------------------+ +------------------+
☝️ Each box in the diagram represents a datum. The boxes together represent a datum stream.
A stream ID is a randomly-assigned and difficult to remember value, so SolarNetwork lets you work with
a friendly pair of identifiers: a node ID and source ID combination. A node ID is a simple number, assigned to all SolarNodes when they are joined to SolarNetwork. A source ID is just some text assigned by a human managing a node. It can take any form, but SolarNetwork by convention uses a path-like format with /
delimiters, like a URL, to help convey meaning, for example site1/building2/floor3/inverter1
.
The point is that most APIs accept node and source IDs as query parameters, so you do not have to remember stream ID values. The previous diagram could equally represent the same datum stream using node and source ID values, for example node ID 12345
and source ID s1/b2/f3/inv1
like this:
+------------------+ +------------------+ +------------------+
| 12345 | | 12345 | | 12345 |
| s1/b2/f3/inv1 | | s1/b2/f3/inv1 | | s1/b2/f3/inv1 |
| 2024-11-05 15:01 | | 2024-11-05 15:02 | | 2024-11-05 15:03 |
+------------------| -- +------------------| -- +------------------| ...
| watts: 1234 | | watts: 987 | | watts: 2345 |
| frequency: 49.9 | | frequency: 50.0 | | frequency: 50.1 |
| status: ok | | status: ok | | status: ok |
+------------------+ +------------------+ +------------------+
Although SolarNetwork does not dictate what property names are used, there are some standard names used by convention, along with associated standard measurement units.
The SolarQuery API and SolarQuery Stream API pages cover the majority of data access API methods. These assume some basic knowledge of using web-based APIs.
The API Explorer is the easiest way to test out API queries. You can use this to download JSON or CSV data. Some useful APIs are:
API | Description |
---|---|
/datum/mostRecent | View recent datum for a node 💡 Given just a node ID, this is a handy way to see what datum streams are available |
/datum/stream/meta/node | See what properties are available for datum streams |
/datum/list | List datum over time, with time range limits |
/datum/stream/datum | List datum over time, no time range limits 👈 This is the best API for large sets of CSV output |
API methods that accept date range query parameters generally work with ISO 8601 local timestamps in the UTC time zone with startDate
and endDate
parameters. The endDate
is exclusive. The time component is optional. For example:
# one day (UTC)
?startDate=2024-10-01&endDate=2024-10-02
# one hour (UTC)
?startDate=2024-10-01T12:00&endDate=2024-10-01T13:00
Instead of UTC you can work in the "node local" time zone with localStartDate
and localEndDate
parameters, which is often more convenient. For example:
# one day (node local zone)
?localStartDate=2024-10-01&localEndDate=2024-10-02
# one hour (node local zone)
?localStartDate=2024-10-01T12:00&localEndDate=2024-10-01T13:00
In API Explorer you can generate curl
commands to download data from your command line.
This can be especially handy for downloding large data sets with the /datum/stream/datum
API using CSV output.
Click on the Curl command header in API Explorer to view a curl
command for the
most recent API method you've invoked. You can also check the No request checkbox to
skip executing the request in the browser, which for large result sets can overwhelm it
and/or cause it to crash. For example:
curl -H 'Accept: text/csv' \
-H 'X-SN-Date: Mon, 04 Nov 2024 17:51:04 GMT' \
-H 'Authorization: SNWS2 Credential=11111111111111111111,SignedHeaders=host;x-sn-date,Signature=e5fdea8c1f305a9b3e537b3f2e9ada997e0402e0be0bf18e0a065e21c8149b61' \
'https://data.solarnetwork.net/solarquery/api/v1/sec/datum/stream/reading?nodeId=123&sourceIds=meter1&localStartDate=2024-01-01&localEndDate=2024-11-01&aggregation=Hour&readingType=Difference' \
>meter1-data-2024-hourly-reading.csv
The sqc tool created by our friends at Ecosuite is another handy way of downloading sets of data.
Run with a Docker volume sqc-data
mounted on /var/tmp
in container to save data to:
# get sqc image
docker pull ecosuite/solarquant:latest
# run sqc image, with `sqc-data` data volume
docker run --rm -it --entrypoint bash --mount src=sqc-data,dst=/var/tmp ecosuite/solarquant:latest
Then use the sqc
command to download data:
# provide token credentials
sqc config authenticate sn
# view source available properties
sqc projects source meter1
┌────────┬──────────────────┬───────────────┬──────────────┬────────┐
│ source │ field │ instantaneous │ accumulating │ status │
├────────┼──────────────────┼───────────────┼──────────────┼────────┤
│ meter1 │ watts │ Y │ │ │
│ meter1 │ current │ Y │ │ │
│ meter1 │ voltage │ Y │ │ │
│ meter1 │ frequency │ Y │ │ │
│ meter1 │ wattHours │ │ Y │ │
│ meter1 │ wattHoursReverse │ │ Y │ │
└────────┴──────────────────┴───────────────┴──────────────┴────────┘
# download month of raw data
sqc datums stream \
--source meter1 \
--format timestamp,watts,current,voltage,frequency,wattHours \
--start 2024-10-01 --end 2024-10-01 \
--output /var/tmp/meter1-data-202410.csv
# download month of hourly data, with some min/max aggregates
sqc datums stream \
--source meter1 \
--format timestamp,watts,watts\$minimum,watts\$maximum,current,voltage,frequency,wattHours \
--start 2024-10-01 --end 2024-10-01 \
--aggregation Hour \
--output /var/tmp/meter1-data-202410-hourly.csv
Because the file is saved into /var/tmp
, which is the sqc-data
Docker data volume, you can
easily copy the file to your workstation using Docker tools.
The timestamp
output from sqc datums stream
will be in the form of a millisecond-resolution Unix
epoch. Excel does not treat these values as dates, but you can create a new columnn and use a
formula like =DATE(1970,1,1)+C2/86400000
where C2
is a timestamp
cell to convert the value
into an Excel date, which you can then format as needed.