[ML] Add logging for failing PyTorch test #81044

davidkyle · 2021-11-25T12:17:41Z

droberts195 · 2021-11-25T12:49:11Z

...-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/PyTorchModelIT.java

@@ -244,10 +244,11 @@ public void testDeploymentStats() throws IOException {
            assertThat(byteSize, equalTo((int) RAW_MODEL_SIZE));

            Response humanResponse = client().performRequest(new Request("GET", "/_ml/trained_models/" + modelId + "/_stats?human"));
-            stats = (List<Map<String, Object>>) entityAsMap(humanResponse).get("trained_model_stats");
+            var responseMap = entityAsMap(humanResponse);
+            stats = (List<Map<String, Object>>) responseMap.get("trained_model_stats");


Given the way the test is failing, it might be worth assigning this to a separate variable.

droberts195 · 2021-11-25T12:51:56Z

...-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/PyTorchModelIT.java

            assertThat(stats, hasSize(1));
            String stringBytes = (String) XContentMapValues.extractValue("deployment_stats.model_size", stats.get(0));
-            assertThat(stringBytes, is(not(nullValue())));
+            assertThat("stats response: " + responseMap, stringBytes, is(not(nullValue())));


Then include stats from line 237 here as well as responseMap.

That way there will be more clues about what happened between the two calls to the same API that got inconsistent results.

For example, it could be that for some reason you get a completely different response depending on which coordinating node you hit (which would obviously be a worse bug).

droberts195

LGTM

davidkyle · 2021-11-25T14:41:33Z

run elasticsearch-ci/bwc

davidkyle · 2021-11-25T14:41:53Z

run elasticsearch-ci/packaging-tests-unix-sample

davidkyle · 2021-11-29T10:23:31Z

run elasticsearch-ci/eql-correctness

davidkyle · 2021-11-29T11:05:04Z

run elasticsearch-ci/rest-compatibility

davidkyle · 2021-11-29T14:04:10Z

run elasticsearch-ci/rest-compatibility

* upstream/master: (150 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

* upstream/master: (55 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ...

davidkyle added >test Issues or PRs that are addressing/adding tests v8.1.0 labels Nov 25, 2021

droberts195 reviewed Nov 25, 2021

View reviewed changes

davidkyle force-pushed the pytorch-test-debug branch from 4988783 to a379d9b Compare November 25, 2021 13:40

droberts195 approved these changes Nov 25, 2021

View reviewed changes

davidkyle added 2 commits November 29, 2021 09:20

Add debug logging for failing test

e4dd42d

record first response

f38a141

davidkyle force-pushed the pytorch-test-debug branch from a379d9b to f38a141 Compare November 29, 2021 09:20

davidkyle merged commit 1abbf4b into elastic:master Nov 29, 2021

davidkyle deleted the pytorch-test-debug branch November 29, 2021 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add logging for failing PyTorch test #81044

[ML] Add logging for failing PyTorch test #81044

davidkyle commented Nov 25, 2021

droberts195 Nov 25, 2021

droberts195 Nov 25, 2021

droberts195 left a comment

davidkyle commented Nov 25, 2021

davidkyle commented Nov 25, 2021

davidkyle commented Nov 29, 2021

davidkyle commented Nov 29, 2021

davidkyle commented Nov 29, 2021

[ML] Add logging for failing PyTorch test #81044

[ML] Add logging for failing PyTorch test #81044

Conversation

davidkyle commented Nov 25, 2021

droberts195 Nov 25, 2021

Choose a reason for hiding this comment

droberts195 Nov 25, 2021

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

davidkyle commented Nov 25, 2021

davidkyle commented Nov 25, 2021

davidkyle commented Nov 29, 2021

davidkyle commented Nov 29, 2021

davidkyle commented Nov 29, 2021