[ML] Allow NLP truncate option to be updated when span is set #91224

davidkyle · 2022-11-01T08:59:58Z

Models preconfigured with the tokenization options truncate: NONE and span: X where X is > 0 error when updating the truncate option. For example

POST _ml/trained_models/model/_infer
{
  "docs": [..]
  "inference_config": {
    "question_answering": {
      "question": "Who moved my cheese?",
      "tokenization" : {
        "bert": {
          "truncate": "second"         <-- override the existing truncate option
        }
      }
    }
  }
}

Returns the error

[span] must not be provided when [truncate] is not [none]"

Because the model is configured with a span option the validation check fails. The work around is to set span: -1 where the value -1 unsets span.

This change wipes out any preexisting span option when truncate is set to first or second. That in itself is a small change the rest is testing and a refactoring of the Roberta and BERT tokenizers to share common code.

elasticsearchmachine · 2022-11-01T09:00:59Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2022-11-01T09:01:00Z

Hi @davidkyle, I've created a changelog YAML for you.

dimitris-athanasiou

Looks good, just a comment about clarifying constructors for RobertaTokenization.

dimitris-athanasiou · 2022-11-01T10:43:44Z

...rc/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/RobertaTokenization.java

@@ -51,7 +51,7 @@ public static RobertaTokenization fromXContent(XContentParser parser, boolean le

    private final boolean addPrefixSpace;

-    private RobertaTokenization(
+    public RobertaTokenization(


Why is there a constructor that doesn't expect doLowerCase for RobertaTokenization? The only public constructor before this change forces lower case to false. This seems something worth clarifying as we're refactoring the area.

I reverted this change as it was only a convenience for a test. The value of doLowerCase should always be false.

dimitris-athanasiou

LGTM

elasticsearchmachine · 2022-11-02T09:27:53Z

💚 Backport successful

Status	Branch	Result
✅	8.5

…c#91224)

#91244)

* main: (1300 commits) update c2id/c2id-server-demo docker image to support ARM (elastic#91144) Allow legacy index settings on legacy indices (elastic#90264) Skip prevoting if single-node discovery (elastic#91255) Chunked encoding for snapshot status API (elastic#90801) Allow different decay values depending on the score function (elastic#91195) Fix handling indexed envelopes crossing the dateline in mvt API (elastic#91105) Ensure cleanups succeed in JoinValidationService (elastic#90601) Add overflow behaviour test for RecyclerBytesStreamOutput (elastic#90638) More actionable error for ancient indices (elastic#91243) Fix APM configuration file delete (elastic#91058) Clean up handshake test class (elastic#90966) Improve H3#hexRing logic and add H3#areNeighborCells method (elastic#91140) Restrict direct use of `ApplicationPrivilege` constructor (elastic#91176) [ML] Allow NLP truncate option to be updated when span is set (elastic#91224) Support multi-intersection for FieldPermissions (elastic#91169) Support intersecting multi-sets of queries with DocumentPermissions (elastic#91151) Ensure TermsEnum action works correctly with API keys (elastic#91170) Fix NPE in auditing authenticationSuccess for non-existing run-as user (elastic#91171) Ensure PKI's delegated_by_realm metadata respect run-as (elastic#91173) [ML] Update API documentation for anomaly score explanation (elastic#91177) ... # Conflicts: # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/XPackClientPlugin.java # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/TransportRollupIndexerAction.java # x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/v2/RollupActionSingleNodeTests.java

davidkyle added 2 commits October 31, 2022 20:25

Allow changing from Truncate.NONE

8c59c64

Roberta update

ce0eb4b

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.6.0 labels Nov 1, 2022

davidkyle added >bug :ml Machine learning v8.5.1 and removed needs:triage Requires assignment of a team area label labels Nov 1, 2022

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 1, 2022

Update docs/changelog/91224.yaml

960ff3c

davidkyle added the auto-backport-and-merge label Nov 1, 2022

dimitris-athanasiou reviewed Nov 1, 2022

View reviewed changes

davidkyle added 2 commits November 1, 2022 15:41

Do for MPNet

adad352

revert public ctor

7cee1f8

dimitris-athanasiou approved these changes Nov 1, 2022

View reviewed changes

davidkyle merged commit defa765 into elastic:main Nov 2, 2022

davidkyle deleted the truncate-update branch November 2, 2022 09:26

davidkyle mentioned this pull request Nov 2, 2022

[8.5] [ML] Allow NLP truncate option to be updated when span is set (#91224) #91244

Merged

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Nov 2, 2022

[ML] Allow NLP truncate option to be updated when span is set (elasti…

b2b3c9b

…c#91224)

elasticsearchmachine pushed a commit that referenced this pull request Nov 2, 2022

[ML] Allow NLP truncate option to be updated when span is set (#91224) (

25b7586

#91244)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Allow NLP truncate option to be updated when span is set #91224

[ML] Allow NLP truncate option to be updated when span is set #91224

davidkyle commented Nov 1, 2022

elasticsearchmachine commented Nov 1, 2022

elasticsearchmachine commented Nov 1, 2022

dimitris-athanasiou left a comment

dimitris-athanasiou Nov 1, 2022

davidkyle Nov 1, 2022

dimitris-athanasiou left a comment

elasticsearchmachine commented Nov 2, 2022

[ML] Allow NLP truncate option to be updated when span is set #91224

[ML] Allow NLP truncate option to be updated when span is set #91224

Conversation

davidkyle commented Nov 1, 2022

elasticsearchmachine commented Nov 1, 2022

elasticsearchmachine commented Nov 1, 2022

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

dimitris-athanasiou Nov 1, 2022

Choose a reason for hiding this comment

davidkyle Nov 1, 2022

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 2, 2022

💚 Backport successful