Discussion: FROST-Server and TimescaleDB #538
Replies: 17 comments 36 replies
-
Timescale came up recently here as well, but we've not had time to look into it.
|
Beta Was this translation helpful? Give feedback.
-
My feeling is that this makes only sense if FROST could expose some aggregate queries based on time series (see #26), which is outside of the standard. Particularly a function like time_bucket_gapfill() would be of high value to anyone who wants implement a decent scalable web interface based on SensorThings. |
Beta Was this translation helpful? Give feedback.
-
In the HERACLES project we store and visualise 100Hz data. By now we have nearly two years of data for three 100 Hz accelerometers, and four 40Hz sensors. You are right that aggregates are required to make this work, but there is no need to have this in the API. In almost all our use cases we calculate aggregates using an external tool (https://github.com/hylkevds/SensorThingsProcessor), that pushes the aggregate data back into FROST as a MultiDatastream with Average, Min, Max and Std-dev. Using this it is already trivial to build a decent scalable web interface based on SensorThings. The problem with making aggregate functions part of the standard is that there are many ways to calculate aggregates, that depend on the exact sampling regime of the sensor. Simply taking the mathematical average of the numbers may work for sensors with a fixed sampling interval, but will not give sensible values in many other cases. |
Beta Was this translation helpful? Give feedback.
-
Hi I have been working on a database project for my laboratory. In our lab, we build our own sensors and distribute it across campus. I realise the database will benefit greatly from adopting the sensorthings api. As mentioned above, I am also interested to integrate timescaledb with the sensorthings data model (especially the observations table). I have tried to convert the observations table into a hypertable (timescaledb) but it failed as timescaledb requires the primary key to be a timestamp column. I was planning to use the result_time column. However, the observations table uses the id column as the primary key. As mentioned in hylkevds post it is not required to integrate aggregate functions into the API, which is also not in the standard. However, I thought converting the observations table to hypertable will benefit the backend in maintaining the database and also in exploring the data locally, it might also help with the performance when ingesting and updating data. It is possible to export the data and perform explorations on the data externally (e.g. in the case of using timescaledb, copy the observations table and migrate it to a hypertable and perform explorations ). It is an extra step, I thought it might be good if the frost-server can provide the option to convert the observations table into a hypertable. Just some thoughts on the subject. Thus, one of my question is, instead of the id as primary key, is it possible to use a timestamp column as key , how will that affect the performance of frost-server. |
Beta Was this translation helpful? Give feedback.
-
@limond sucessfully adopted timescale DB below frost for us with considerable speed up. We should document our index setup ( @johannes-riesterer )
|
Beta Was this translation helpful? Give feedback.
-
That timescaledb forces the primary key to be the time column surprises me greatly. That makes it useless for any application that gathers data from multiple sources, as a time collision is guaranteed to happen eventually... Unless the restriction is that the time column has to be a part of the primary key, and you can make it a combined PK of the time with another column. That said, FROST doesn't really care what the DB thinks is the primary key, as long as the ID column is unique. So you are free to change the PK to whatever you see fit, and FROST won't complain until there is a PK violation and the insert fails. |
Beta Was this translation helpful? Give feedback.
-
thanks for the reply. @hylkevds I think it is the case that time column has to be a part of the primary key and you can use another column (https://docs.timescale.com/latest/using-timescaledb/schema-management#indexing-best-practices). But I am not sure, I am new and exploring the implementation of these software stacks. @riedel it will be useful to see how you guys integrated timescaledb. Thanks. Do let me know where I can reference your integration. |
Beta Was this translation helpful? Give feedback.
-
@chenkianwee you can find our deployment here: https://github.com/SmartAQnet/smartaqnet-infrastructure/tree/cluster-dev |
Beta Was this translation helpful? Give feedback.
-
@riedel @chenkianwee I was mentioned above, so I would like to add a few words despite me no longer working for TECO. |
Beta Was this translation helpful? Give feedback.
-
Thanks @limond, good to hear from you (I was just was about to post the sql file after looking for it myself)
I just asked @johannes-riesterer to merge the documentation back, but I think there is not much more on the index/hypertable setup part except for this quote
|
Beta Was this translation helpful? Give feedback.
-
Thanks @limond and @riedel , this is very helpful !! I had a quick look at the information, looks like shld be enough information for my setup. Thanks again. |
Beta Was this translation helpful? Give feedback.
-
Dear all,
If I fetch data using the API including the Feature of Interest ($expand=...), it seems that the API slice the request in different queries:
The issue with this approach is that it looks into all chunks for each feature! Here is a small sample of the query planning (I have around 1500 "1-week chunk" over 40 years of data).
The url looks like:
Without the timescaleDB hypertable, the query is much faster (30x) as there is only one table to look into. Any suggestions on possible improvements? New index, improved URL or different chunks size? |
Beta Was this translation helpful? Give feedback.
-
These two indices are the wrong way around:
They first separate the observations based on ID, and then group them by PhenomenonTimeStart / FeatureID. But the ID field is already unique, so the second column never takes effect. |
Beta Was this translation helpful? Give feedback.
-
I changed the indexes as you mentioned and added a new one on "ID". The scan of each chunks is 4x faster but still the API answer is slow and that is because each feature is selected separately. |
Beta Was this translation helpful? Give feedback.
-
ok I understand. |
Beta Was this translation helpful? Give feedback.
-
Continuing the work on the TimescaleDB and sensorthings optimization. But this is slow with TimescalDB. So I tried to include in the URL the time information to exclude the unnecessary chunks with:
Any suggestion on how to solve this issue? |
Beta Was this translation helpful? Give feedback.
-
Einen wunderschönen buongiorno alle mitenand, I am also attempting to integrate the TimescaleDB postgres extension into the OBSERVATIONS table. For example, I sent a POST request to /v1.1/CreateObservations with this body: CreateObservations body[
{
"Datastream": {
"@iot.id": 1
},
"components": [
"phenomenonTime",
"result",
"FeatureOfInterest/id"
],
"dataArray": [
["2017-06-29T09:46:51.754Z",2.0953409217499532e-36,1]
]
}
] In Postgres I enabled both log_statement and auto_explain.log_analyze. Thus the executed statement and an 'explain analyze' of the executed statement are shown. This is the result of the query: truncated Postgres log
The actual This raises three questions:
Is this not redundant? Should data not be forwarded internally instead of writing it to the database and then querying it again right after?
Admittedly, I know very little about the implementation of the FROST server, so it's much appreciated if somebody could shed some light on this matter. Muchas Thanks! |
Beta Was this translation helpful? Give feedback.
-
Wondering if anyone has had any thoughts on the feasibility of adding TimescaleDB to the FROST-Server schema; the observations tables being the obvious candidate.
I've trialled TimescaleDB on a custom Postgres schema and found the performance to be exponentially better than a traditional database (80M row dataset).
I think there'd be benefits in converting the observations table into a hypertable however I think code review would be required to get the most out of TimescaleDB and its additional time-based functions
Beta Was this translation helpful? Give feedback.
All reactions