You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an issue where a frame with tens of thousands of rows and a handful (<10) of columns is very slow to apply Frame.mapColValues over. On the other hand, when first mapping the series "manually" (see below) and then joining with Frame.ofColumns, the difference in speed is of orders of magnitude.
What I'm looking to do is a naive hourly averaging of time series data. I implemented it with essentially the following:
// A simple series of 20 000 observations with one minute intervalletstartFrom= DateTimeOffset.Parse "2021-10-27T00:00:00Z"letseries=
Seq.init 20000(fun idx -> startFrom.AddMinutes (float idx), float idx)|> Series.ofObservations
// Two columns with the same series from aboveletcolumns=seq{"one", series;"two", series }letframe= Frame.ofColumns columns
// Comparison of two timestamps to check if the hour is the sameletisSameHour(d1:DateTimeOffset)(d2:DateTimeOffset)=
d1.Hour = d2.Hour && d1.Day = d2.Day && d1.Month = d2.Month && d1.Year = d2.Year
// Three methods to convert to a new frame with hourly averages// 1. Using Frame.mapColValues, takes over a second
frame |> Frame.mapColValues (Series.chunkWhileInto isSameHour Stats.mean)// this takes over a second// 2. An approximation of the internals of Frame.mapColValues, takes the same time (over a second):
frame.Columns
|> Series.mapValues (Series.chunkWhileInto isSameHour Stats.mean)|> Frame.ofColumns
// 3. Sidestepping the initial frame, this takes 10-20 *milli*seconds:
columns
|> Seq.map (fun(k,s)-> k, s |> Series.chunkWhileInto isSameHour Stats.mean)|> Frame.ofColumns
It may well be that I'm overlooking something here, I'm not super confident with either the Deedle codebase nor performance diagnosis in F#. I do have a setup with BenchmarkDotNet, which I could extract and share if that would be helpful.
Is this kind of performance expected? I believe I can avoid the issue in my use case by using method 3 from above, but I'm struggling to understand what could cause this kind of performance difference in this case.
The text was updated successfully, but these errors were encountered:
For what it's worth, I did some more testing and found that the bad performance can also be avoided by using Frame.getNumericCols or even simply Frame.getCols.
// About the same performance as approach number 3 from above
frame
|> Frame.getNumericCols
|> Series.mapValues (Series.chunkWhileInto inlineComparison Stats.mean)|> Frame.ofColumns
// Slightly worse behavior, but still around 50 milliseconds versus ~10+ milliseconds for the above // or ~1+ seconds for Frame.mapColValues
frame
|> Frame.getCols
|> Series.mapValues (Series.chunkWhileInto inlineComparison Stats.mean)|> Frame.ofColumns
I ran into an issue where a frame with tens of thousands of rows and a handful (<10) of columns is very slow to apply
Frame.mapColValues
over. On the other hand, when first mapping the series "manually" (see below) and then joining withFrame.ofColumns
, the difference in speed is of orders of magnitude.What I'm looking to do is a naive hourly averaging of time series data. I implemented it with essentially the following:
It may well be that I'm overlooking something here, I'm not super confident with either the Deedle codebase nor performance diagnosis in F#. I do have a setup with BenchmarkDotNet, which I could extract and share if that would be helpful.
Is this kind of performance expected? I believe I can avoid the issue in my use case by using method 3 from above, but I'm struggling to understand what could cause this kind of performance difference in this case.
The text was updated successfully, but these errors were encountered: