-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create financial-focused OHLC aggregation #445
Comments
@ryanbooz @davidkohn88 One obvious UX design that I can think of looks like this, Create a 1-minute CAGG from a raw trading data table called “raw_trades”. CREATE MATERIALIZED VIEW ohlcv_1m
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 minute', ts) AS time,
symbol,
ts_ohlcv_agg(ts, price, volume) AS ohlcv
FROM raw_trades
GROUP BY time, symbol Rollup the 1-minute ohlcv aggregates to the desired time interval at query time and get open, high, low, close and volume data through accessor functions ts_open, ts_high, and etc. SELECT time_bucket('1 hour', time) AS hour,
ts_open(rollup(ohlcv)) AS open,
ts_high(rollup(ohlcv)) AS high,
ts_low(rollup(ohlcv)) AS low,
ts_close(rollup(ohlcv) AS close,
ts_volume(rollup(ohlcv) AS volume
FROM ohlcv_1m
WHERE time > $time_from AND
symbol = $symbol
GROUP BY hour This design follows our established two-step aggregation design pattern. Do you feel that's sufficient UX improvement worth pursuing? |
I think we can remove the I think also that making this specific to the financial use case and using their terminology would be great here as well and act as an entrance point for folks where we might get more feedback on other things we could be doing with something like this. |
Yep. Overall I think this could be a huge win. The one challenge will be Other APIs provide the volume of each trade and the user has to determine it (with a SUM or similar). So, knowing the basic UX of how most APIs tend to provide data might be helpful here. If it's only two or three ways, maybe we have different accessors depending on what's being stored in volume? |
Leaving out the volume part might also be easier here, though creating a weighted average price could be a useful metric we could add on to the typical that would be useful there if we had volume. I think we'd need an argument to agg about how the volume works, though we might be able to just store the results for both things, ie, store the sum and the min/max of volume and depending on the accessor either output the sum or the max -min depending on the type...hmmm...could end up overflowing in some cases, so would need to account for that...but not crazy... |
I like the weighted average price idea! The argument agg/accessor is an interesting idea. I suppose one of the benefits of storing both could be that we future proof the data as an API might change how they'd present the data over time, but users wouldn't (necessarily) have to change the CAGG to accommodate that... just the query. 🤔 |
I also like the idea of volume-weighted average. This is how Investopedia defines it:
It appears to me that we may want to think this problem in a broader stock/security trading context. For example, once a CAGG is defined, we need to have a way to deal with stock splits? |
I think this feature would greatly help users who deal with financial data. Additional values that we could calculate within the aggregation are technical indicators. AFAIK, most if not all technical indicators can be calculated from the raw tick data. On Twelve Data Docs you can find a list of technical indicators that we could consider. You can also see which are in SELECT time_bucket('1 hour', time) AS hour,
open(rollup(ohlcv)) AS open,
high(rollup(ohlcv)) AS high,
low(rollup(ohlcv)) AS low,
close(rollup(ohlcv) AS close,
rsi(rollup(ohlcv) AS rsi_indicator, --calculates the magnitude of a price change
macd(rollup(ohlcv) AS macd_indicator, --Moving Average Convergence Divergence
ema(rollup(ohlcv) AS ema --Exponential Moving Average, places greater importance on recent data points
FROM ohlcv_1m
WHERE time > $time_from AND symbol = $symbol
GROUP BY hour |
as a related question to technical indicator functionality - is there a lower level API that one can use to define a custom function (such as a volume weighted average price calculation)? |
@zhenyakovalyov we're planning to release an OHLC aggregate (without volume taken into account) in the very near future. This was a more straightforward implementation and gives us some more time to think about how we'd like to treat volume data and calculations. We're aware of volume being reported as either
Do you have an expectation as one of these being the default case instead of the other? For example, we're considering a CREATE MATERIALIZED VIEW ohlcv
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 minute', ts) AS time
, symbol
, finance_agg(ts, price, volume)
FROM raw_trades
GROUP BY time, symbol or CREATE MATERIALIZED VIEW ohlcv
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 minute', ts) AS time
, symbol
, finance_agg(ts, price, volume, incremental_volume => true)
FROM raw_trades
GROUP BY time, symbol We would love feedback on this kind of API while we're still in the early stages of designing it! For example, is what you had in mind more like another argument, e.g. |
I like this API so far! 💯
|
496: Create OHLC aggregate and accompanying accessors r=rtwalker a=rtwalker Begins to address #445. Introduces OHLC aggregate called as `toolkit_experimental.ohlc(timestamp, price)`. The following accessors are provided for use with the `OpenHighLowClose` aggregate: * `open`, `high`, `low`, `close` along with the timestamp-returning versions: * `open_at`, `high_at`, `low_at`, `close_at` Additionally, there is a new `toolkit_experimental.rollup()` aggregate that takes `OpenHighLowClose` aggregates as input. In the event that multiple rows contain the highest or lowest value, only the earliest occurrence is retained. Similarly, should multiple rows contain the opening or closing timestamp, this implementation only updates the open/close points with a new value when the new timestamp is strictly earlier/later than current timestamp. Additional work on #445 remains to address handling volume (and possibly creating technical indicators). Co-authored-by: Ryan Walker <rwalker@timescale.com>
If there were two columns, one for period volume and one for aggregated volume as many data companies deliver it, then less of an issue. If only one then I would look to have just the period volume as it's a lot easier (maybe less costly?) to sum the volume column than have a window function where you subtract the previous from the current volume column. Could that be a switch, if only one column would deliver the difference between the two previous current) as an option? A kind of built-in window function for volume? |
closed by #596 |
Is there a scope to add another function to calculate proce change as (current close - previous close) by using lag() and accounting for resets as such... Also there could be extension to include another value called "open interest" which is generally a very imp indicator in derivative market. |
Is your feature request related to a problem? Please describe.
Many users of TimescaleDB utilize financial time-series data. Many of our examples and tutorials show how to do basic OHLC bucketing, saving at least four columns using MAX/MIN/FIRST/LAST as the functions. It's a repetitive processes and some functions don't work well with re-aggregation - and even when they do (MAX/MIN), most of our documentation/context warns users against using functions to re-aggregate unless they were designed to do so.
Describe the solution you'd like
Saving an aggregate that stores the necessary data that can then be accessed at query time to return the appropriate value for financial charts. At a minimum, users would have to be able to easily identify the Open/High/Low/Closing value within a bucket. Allowing re-aggregation over smaller buckets would be powerful, which means that users could now satisfy multiple bucket widths from one CAGG, rather than having to create various granularities. (1-min, 5-min, 10-min, 60-min).
Describe alternatives you've considered
Currently, the only solution we really have to offer people is to follow our schema examples for OHLC data using separate aggregate functions and creating multiple CAGGs. The big challenge here, is that developing an application to view multiple "zoom levels" isn't easy without more complicated
UNION
examples that pull from different CAGGs depending on the "zoom level" (overall query length). Being able to using the same SQL query and simply re-aggregate to a different time bucket would be 🔥Additional context
@davidkohn88 suggested I jump-start the conversation by creating the issue 😉. We have so many users that do this kind of aggregation. If we could simplify their storage (one CAGG) and query (only have to change the time bucket interval value) - this could simplify how users query data significantly.
The text was updated successfully, but these errors were encountered: