Skip to content

Conversation

@tveasey
Copy link
Contributor

@tveasey tveasey commented Apr 13, 2018

Profiling anomaly detection on a large population (cardinality 2m) showed up that accessing weights, which communicate things like sample importance, seasonal heteroskedasticity, and so on, consumes about 8.5% of the total end of bucket processing time. We have a small number of possible weights. Therefore, switching to encoding the weight style by offset in a fixed size weights array means we can avoids nearly all this overhead. I get a concomitant performance improvement on highly partitioned analysis, where end of bucket processing is the bottleneck. This is should have no impact on any results. A step towards #53.

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of minor comments.

I would also be nice to add a comment to the PR saying which dataset and config you saw the speedup on and timings before and after the change.

void propagateLastThreadAssert() {
if (m_LastException) {
throw *m_LastException;
throw * m_LastException;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be something about the version of clang format I'm using. (Although I thought I was using the same version as Ed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used clang-format 5.0.1 (for the record)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also got 5.0.1, having download the pre-built binaries from http://releases.llvm.org/download.html

On running dev-tools/clang-format.sh on this PR branch it reverted this change and also made one other change in CMultivariateNormalConjugate.h.

It seems that we're going to have to mandate an exact version of clang-format for each branch rather than just the major version before we can start failing builds due to formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm using 5.0.0, so I think this must be the reason. I'll correct the formatting here.

return {x1, x2};
} catch (const std::exception& e) {
LOG_ERROR(<< "Failed to compute confidence interval: " << e.what());
LOG_ERROR("Failed to compute confidence interval: " << e.what());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to the pattern of beginning all log statements with << even though it's not necessary when the first item is a string literal. Otherwise the rules about when a leading << is required will be very complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was a merge error.

double shift = (r - t) / 2.0;
logSamplesMoments.add(std::log(x) - shift, n / scale);
}
} catch (const std::exception& e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some quite deeply nested and complex logic in the block above. Are you certain none of it uses Boost or Eigen functionality that might throw an exception? If it does then such an exception will now be fatal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is now safe to remove. The nested code is basic statistics stuff, Gauss-Legendre integration and in the local class CLogSampleSquareDeviation none of which can throw.

I actually thought a bit about whether or not to remove this try catch. (For example, there are other try catch blocks I could down scope as a result of this change, which I haven't touched.) In the end I decided since I could remove it altogether that I would go ahead and make the change on understandability grounds.

@tveasey
Copy link
Contributor Author

tveasey commented Apr 16, 2018

The original profile which showed this up was a large population analysis attached to issue #53. The probability calculation cache change significantly altered the breakdown of runtime for population analysis, so this is no longer as useful a test case.

I verified profiling a custom standalone executable I built that this change kills the contribution from weight look up to end-of-bucket processing. I was planning to extract the delta in runtime on the full QA regression suite when this is committed and update #53. The problem is that we need cases where the runtime bottleneck is end of bucket processing of the autodetect process, which isn't always the case.

The clearest results will be for high cardinality partition analyses running the autodetect process standalone. I'll add runtimes before and after this change for these cases to issue #53.

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey changed the title [ML] Encoding distribution model weight style by offset in a fixed size weight array [ML] Encode distribution model weight style by offset in a fixed size weight array Apr 17, 2018
@tveasey tveasey merged commit fc48c7c into elastic:master Apr 17, 2018
@tveasey tveasey mentioned this pull request Apr 17, 2018
5 tasks
droberts195 pushed a commit that referenced this pull request Apr 18, 2018
This should have been done in #54 but slipped through the net as we
compile out trace logging in optimised builds.
@droberts195
Copy link

When this is backported to 6.4 please also backport 1a750aa.

droberts195 pushed a commit that referenced this pull request Apr 23, 2018
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
This should have been done in #54 but slipped through the net as we
compile out trace logging in optimised builds.
tveasey added a commit that referenced this pull request Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants