Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide generalized aggregation #77

Merged
merged 1 commit into from
Jan 25, 2018

Conversation

pluehne
Copy link
Contributor

@pluehne pluehne commented Jan 10, 2018

This implements a generalized method for aggregating time-series data. Data can be aggregated over week or month intervals with a variety of aggregation methods to choose from.

This will be useful for providing chart views at different levels (such as two-year periods vs. just showing the last month). Additionally, the generalized form of aggregation can be used to smooth out graphs where the sampling frequency changed with an update to Hubble Enterprise.

The aggregation is done by splitting the time data into subsequent, gapless periods of time (weeks starting with Mondays or months), for each of which the aggregated values are then computed and returned.

Aggregation methods define how to aggregate the values within individual time periods. The following aggregation methods are supported:

  • sum
  • mean
  • min
  • max
  • first (the chronologically first available value for that period)
  • last
  • median

Periods with incomplete data at the beginning or the end of the time series are excluded from the aggregation.

Finally, the pull request usage chart is changed to make use of the new aggregation facilities to reduce the granularity from daily to monthly data for now. This might be changed when we implement detail views.

I also added several unit tests to check the aggregation methods (for off-by-one errors in particular) as well as a short piece of documentation on the new configuration options.

@@ -27,7 +27,7 @@
<script src="{{ site.baseurl }}/assets/js/vendor/moment-with-locales.min.js"></script>
<script src="{{ site.baseurl }}/assets/js/vendor/Chart-2.7.1.min.js"></script>
<script src="{{ site.baseurl }}/assets/js/vendor/spin-2.3.2.min.js"></script>
<script src="{{ site.baseurl }}/assets/js/charts.js?version=1ff0187"></script>
<script src="{{ site.baseurl }}/assets/js/charts.js?version=e7e9c5a"></script>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to generate a random hash or a hash over a file with Jekyll (via GitHub pages)?
/cc @parkr 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site.github.build_revision would work on Pages. https://help.github.com/articles/repository-metadata-on-github-pages/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkr: Thanks for the pointer! This would generate new ?version= hashes for each commit, right? Do you know of a way to, for example, get a hash based on the actual file content? In this way, we wouldn’t force client or server cache flushes with every single commit. (I’m just asking, your solution would solve our problem pretty well already!)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hashes on a per-file basis are generated, so that’s the best solution that will achieve cache misses based on what’s in Pages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then this is the way to go. Thanks, @parkr!

@pluehne pluehne force-pushed the patrick/improved-aggregation branch 2 times, most recently from 194f329 to 7dfc075 Compare January 10, 2018 14:03
@pluehne
Copy link
Contributor Author

pluehne commented Jan 15, 2018

In case someone is interested, I’ll come back to this soon. I wanted to have #81 in place before finishing this.

Copy link
Contributor

@filmaj filmaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Nice job on the tests - this PR bumps up the JS code coverage from 11% to 26%.

high-fives

createList,
createTable,
createSpinner,
d3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globals can be defined in a variety of ways in eslint. Other than inlining the list into particular .js files (like we do here), you can also specify them via eslint config files, similarly to how we do in the browser-context specific .eslintrc.json file (both env and globals config properties, in their own different ways, define what globals are expected to be available). I think we should move any globals defined by external libraries that our scripts depend on, like d3, jquery and moment.js, for example, to the configuration files, rather than as per-js-file running lists. In this way, if our webapp frontend ends up adding a new .js file, we won't have to duplicate the inline comment allowing for d3 globals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that’s a good pointer. I wasn’t happy with inlining these globals either, and I second that it would be best to configure this in the .eslintrc.json.

I’ll return to this pull request in a couple of days. I’ll have to rebase these changes on top of the other new stuff, and then I’ll also adjust the global directives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filmaj: I tried to change the globals declaration for d3 to true without success. I’d say let’s tidy up the global declaration in a separate pull request 🙂.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, I can take that on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice, thanks 😄!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pluehne I added another commit to my fork's branch of this PR that should fix that, feel free to pull in to this PR: https://github.com/filmaj/hubble/tree/patrick/improved-aggregation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filmaj: Thanks a lot! What I didn’t understand was that there are multiple .eslintrc.json files with different purposes—one for linting the assets themselves and one for linting the unit tests.

I squashed your changes into my commit above (to not have one commit do it improperly and another one fixing it right after), and I hope that this is fine with you 🙂.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pluehne yep perfectly fine 👍

@pluehne pluehne force-pushed the patrick/improved-aggregation branch 3 times, most recently from b5944e2 to 618bf5f Compare January 22, 2018 16:34
@pluehne pluehne force-pushed the patrick/improved-aggregation branch from 618bf5f to 01b488f Compare January 23, 2018 13:47
@filmaj
Copy link
Contributor

filmaj commented Jan 23, 2018

Huh, this code coverage comment uses the compact "header" layout. 🤔

@pluehne pluehne force-pushed the patrick/improved-aggregation branch 3 times, most recently from f50edb7 to 379562a Compare January 24, 2018 11:29
data.sort((row1, row2) => row1['date'] - row2['date']);

const dateStart = data[0]['date'];
// Ranges are exclusive, so add one more day to include the last date
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/last date/last day/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refering to the “last date” was intentional, because this actually is about dates. However, “day” isn’t wrong either … not sure whether that’s worth rephrasing though.

// Note that this assumes complete data in the period
// Should data points be missing, aggregation methods such as the sum will lead to results that can't be
// compared to periods with complete data
// Hence, the maintainers of the data need to ensure that the input is well-formed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: It might be more readable if you add a . at the end of the sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t add periods, because we don’t have them in comments in other places. Should we do this everywhere then? Or put periods just between sentences in multisentence comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I’d say that we put periods between sentences systematically except for the end of comments (as is practiced by many print magazines, for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this as I suggested above.

function(keyID, key)
{
if (key == 'date')
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: that might be overly cautious as we use the value already in line 155 above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is to exclude the dates themselves from being aggregated.

The data points all have the structure:

{
    "date": "yyyy-mm-dd",
    "key1": "value1",
    "key2": "value2",
    ...
}

We only want to aggregate the values of key1 and key2 but not date.

@pluehne pluehne force-pushed the patrick/improved-aggregation branch 2 times, most recently from ce65f17 to 7f0abc0 Compare January 24, 2018 12:54
This implements a generalized method for aggregating time-series data.
Data can be aggregated over week or month intervals with a variety of
aggregation methods to choose from.

This will be useful for providing chart views at different levels (such
as two-year periods vs. just showing the last month). Additionally, the
generalized form of aggregation can be used to smooth out graphs where
the sampling frequency changed with an update to Hubble Enterprise.

The aggregation is done by splitting the time data into subsequent,
gapless periods of time (weeks starting with Mondays or months), for
each of which the aggregated values are then computed and returned.

Aggregation methods define how to aggregate the values within individual
time periods. The following aggregation methods are supported:

- sum
- mean
- min
- max
- first (the chronologically first available value for that period)
- last
- median

Periods with incomplete data at the beginning or the end of the time
series are excluded from the aggregation.

Finally, the pull request usage chart is changed to make use of the new
aggregation facilities to reduce the granularity from daily to monthly
data for now. This might be changed when we implement detail views.

I also added several unit tests to check the aggregation methods (for
off-by-one errors in particular) as well as a short piece of
documentation on the new configuration options.
@pluehne pluehne force-pushed the patrick/improved-aggregation branch from 7f0abc0 to 01e714c Compare January 24, 2018 20:29
@codecov-io
Copy link

Codecov Report

Merging #77 into master will increase coverage by 10.52%.
The diff coverage is 70.37%.

@larsxschneider larsxschneider merged commit 74fce70 into master Jan 25, 2018
@larsxschneider larsxschneider deleted the patrick/improved-aggregation branch January 25, 2018 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants