-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] add version information in case of crash of native ML process #30674
Conversation
Pinging @elastic/ml-core |
// add version information, so it's conveniently next to the crash log | ||
upstreamMessage += ", version: "; | ||
try { | ||
upstreamMessage += getCppCopyright(Duration.ofMillis(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might look a bit weird to see the copyright as part of the "version" part of the message. The bit we actually want comes before the copyright. It could be got rid of quite simply like this:
upstreamMessage += getCppCopyright(Duration.ofMillis(10)).replaceFirst(" Copyright.*", "");
For example, from:
autodetect (64 bit): Version 7.0.0-alpha1-SNAPSHOT (Build 799d6b9e8bdff9) Copyright (c) 2018 Elasticsearch BV
we'd add:
, version: autodetect (64 bit): Version 7.0.0-alpha1-SNAPSHOT (Build 799d6b9e8bdff9)
If we can already easily distinguish normalize
from autodetect
then we could add another replaceFirst
to strip up to "Version ", so we'd get:
, version: 7.0.0-alpha1-SNAPSHOT (Build 799d6b9e8bdff9)
But it's useful to know the program name and I can't remember how easy it is to find that out from what's in the error message already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process name is part of the log line, e.g.
[2018-05-17T10:31:56,396][ERROR][o.e.x.m.j.p.a.NativeAutodetectProcess] [g6] autodetect process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0x7f5b8d4160b8, library: /home/hendrik/work/git-elastic/elasticsearch/test-install/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f5b8d0de000, normalized address: 0x3380b8', version: autodetect (64 bit): Version based on 7.0.0-alpha1-SNAPSHOT (Build DEVELOPMENT BUILD by hendrik) Copyright (c) 2018 Elasticsearch BV
therfore: yes we can potentially replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful then to have a specific getCppVersion
method that pulls out just the version data as @droberts195 describes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be a good opportunity to update the comment on the getCppCopyright
method as it currently refers to "... the process ID of the C++ process ".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful then to have a specific getCppVersion method that pulls out just the version data
If we do that it would be good to use it instead of getCppCopyright()
in NativeController.getNativeCodeInfo()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I am refactoring it, so there is just 1 place that parses the string.
@@ -283,6 +283,15 @@ private void parseMessage(XContent xContent, BytesReference bytesRef) { | |||
if (upstreamMessage.contains("bad_alloc")) { | |||
upstreamMessage += ", process ran out of memory."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
at the end of this is no longer appropriate, as the line below will add a comma.
I pushed changes to omit anything but the essential version information. It looks like this now:
On official builds you will see a proper version and build hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I just noticed one pre-existing nit that you moved from one file to another.
return info; | ||
} else { | ||
// If this happens it probably means someone has changed the format in lib/ver/CBuildInfo.cc | ||
// in the machine-learning-cpp repo without changing the pattern above to match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: machine-learning-cpp
-> ml-cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
16e6122
to
aa9e906
Compare
…30674) This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line. relates elastic/ml-cpp#94
…ngs-to-true * elastic/master: Tests: Fail if test watches could not be triggered (elastic#30392) [ML] add version information in case of crash of native ML process (elastic#30674) Make TransportClusterStateAction abide to our style (elastic#30697) Change required version for Get Settings transport API changes to 6.4.0 (elastic#30706) [DOCS] Fixes edit URLs for stack overview (elastic#30583) Silence sleep based watcher test [TEST] Adjust version skips for movavg/movfn tests
* elastic/master: [DOCS] Removes out-dated x-pack/docs/en/index.asciidoc [DOCS] Removes redundant index.asciidoc files (elastic#30707) [TEST] Reduce forecast overflow to disk test memory limit (elastic#30727) Plugins: Remove meta plugins (elastic#30670) [DOCS] Moves X-Pack configurationg pages in table of contents (elastic#30702) TEST: Add engine log to testCorruptFileThenSnapshotAndRestore [ML][TEST] Fix bucket count assertion in ModelPlotsIT (elastic#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (elastic#30716) Default copy settings to true and deprecate on the REST layer (elastic#30598) [Build] Add test admin when starting gradle run with trial license and This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths. Tests: Fail if test watches could not be triggered (elastic#30392) [ML] add version information in case of crash of native ML process (elastic#30674) Make TransportClusterStateAction abide to our style (elastic#30697) Change required version for Get Settings transport API changes to 6.4.0 (elastic#30706)
* 6.x: Mute testCorruptFileThenSnapshotAndRestore Plugins: Remove meta plugins (#30670) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) Docs: Add uptasticsearch to list of clients (#30738) [TEST] Reduce forecast overflow to disk test memory limit (#30727) [DOCS] Removes redundant index.asciidoc files (#30707) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) [Build] Add test admin when starting gradle run with trial license and [ML] provide tmp storage for forecasting and possibly any ml native jobs #30399 Tests: Fail if test watches could not be triggered (#30392) Watcher: Prevent duplicate watch triggering during upgrade (#30643) [ML] add version information in case of crash of native ML process (#30674) Add detailed assert message to IndexAuditUpgradeIT (#30669) Preserve REST client auth despite 401 response (#30558) Make TransportClusterStateAction abide to our style (#30697) [DOCS] Fixes edit URLs for stack overview (#30583) [DOCS] Add missing callout in IndicesClientDocumentationIT Backport get settings API changes to 6.x (#30494) Silence sleep based watcher test [DOCS] Replace X-Pack terms with attributes Improve explanation in rescore (#30629) [test] packaging: add windows boxes (#30402) [ML] Clean left behind model state docs (#30659) filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) MovingFunction Pipeline agg backport to 6.x (#30658) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656)
* master: Scripting: Remove getDate methods from ScriptDocValues (#30690) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) [Docs] Fix single page :docs:check invocation (#30725) Docs: Add uptasticsearch to list of clients (#30738) [DOCS] Removes out-dated x-pack/docs/en/index.asciidoc [DOCS] Removes redundant index.asciidoc files (#30707) [TEST] Reduce forecast overflow to disk test memory limit (#30727) Plugins: Remove meta plugins (#30670) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) TEST: Add engine log to testCorruptFileThenSnapshotAndRestore [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) Default copy settings to true and deprecate on the REST layer (#30598) [Build] Add test admin when starting gradle run with trial license and This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths. Tests: Fail if test watches could not be triggered (#30392) [ML] add version information in case of crash of native ML process (#30674) Make TransportClusterStateAction abide to our style (#30697) Change required version for Get Settings transport API changes to 6.4.0 (#30706) [DOCS] Fixes edit URLs for stack overview (#30583) Silence sleep based watcher test [TEST] Adjust version skips for movavg/movfn tests [DOCS] Replace X-Pack terms with attributes [ML] Clean left behind model state docs (#30659) Correct typos filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Remove bogus file accidentally added Add detailed assert message to IndexAuditUpgradeIT (#30669) Adjust fast forward for token expiration test (#30668) Improve explanation in rescore (#30629) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656) [Tests] Add debug information to CorruptedFileIT # Conflicts: # test/framework/src/main/java/org/elasticsearch/indices/analysis/AnalysisFactoryTestCase.java
…lastic#30674) This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line. relates elastic/ml-cpp#94
This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information in one line.
relates elastic/ml-cpp#94