[text analytics] Updates to Healthcare design #16247

abhahn · 2021-01-20T03:08:07Z

This PR contains design updates for the Healthcare SDK based on direct feedback on the original Python design and also aligning with the updated design in other SDK languages (Java and JS).

check-enforcer · 2021-02-02T00:53:00Z

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run python - [service] - ci

abhahn · 2021-02-02T23:06:36Z

/check-enforcer reset

abhahn · 2021-02-03T17:40:51Z

/check-enforcer override

…enamed properties of that class; updated samples

…into abhahn/ta-v3.1-preview.4-healthcare-updates

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py

maririos

Are you planning on having a changelog added in another PR? or should it be included here too

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_generated/_operations_mixin.py

maririos · 2021-02-03T22:46:33Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py

+        analyze_healthcare_entities_poller._polling_method.update_status()
+
+        if analyze_healthcare_entities_poller._polling_method.status() in terminal_states:
+            print("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \


should this be raised as an error? or does Python knows how to mask this print statement into something the user can see

Maybe an error would be better. It's a little bit difficult to test every code path here since most of the time the jobs succeed quickly enough that this always happens. I could try adding larger documents to the request to see if i can get the job to run long enough to test actual cancellation.

I decided to raise the exception type Warning in the new cancellation method on the poller. I thought this type made the most sense of all the built-in exceptions, but I can change it to something else if anyone has any suggestions.

hmm idk what the behavior is in Python for this scenarios. @iscai-msft for context

for python we can log warnings, usually they look like this, not Warning raises. I think if other languages are raising an error when trying to cancel a finished operation, we might as well just raise an error too though

I got the inspiration for this from here. If there is a better exception type to use, could you make a suggestion?

maririos · 2021-02-03T22:48:21Z

...nalytics/samples/async_samples/sample_analyze_healthcare_entities_with_cancellation_async.py

-            poller = await text_analytics_client.begin_analyze_healthcare(documents)
-            poller = await text_analytics_client.begin_cancel_analyze_healthcare(poller)
+            poller = await text_analytics_client.begin_analyze_healthcare_entities(documents)
+            poller = await text_analytics_client.begin_cancel_analyze_healthcare_entities(poller)


curious if the option of having the cancellation be part of the poller was contemplated. This is what other languages are doing. Not sure if can be done/make sense in Python

I think this would be a cleaner design, and would be easy to move the cancellation logic from the client method to a method on the poller. Let me try it so I can get feedback on the update.

@maririos , @iscai-msft , I moved cancellation code to the poller to try it out. Let me know what you think.

looks like the sample needs to be updated?

begin_cancel_analyze_healthcare_entities a public API that user can see?

@mssfang , I still need to update the sample to reflect a recent change, but originally this was how it was designed. I moved the logic to the poller in a recent commit.

The sample has been updated with new cancellation code.

maririos · 2021-02-03T22:49:58Z

sdk/textanalytics/azure-ai-textanalytics/tests/test_analyze_healthcare.py

@@ -42,7 +42,8 @@ def test_all_successful_passing_dict(self, client):
        docs = [{"id": "1", "language": "en", "text": "Patient does not suffer from high blood pressure."},
                {"id": "2", "language": "en", "text": "Prescribed 100mg ibuprofen, taken twice daily."}]

-        response = client.begin_analyze_healthcare(docs, show_stats=True, polling_interval=self._interval()).result()
+        poller = client.begin_analyze_healthcare_entities(docs, show_stats=True, polling_interval=self._interval())
+        response = poller.result()

        self.assertIsNotNone(response.statistics)



looking at

for doc in response: self.assertIsNotNone(doc.id) self.assertIsNotNone(doc.statistics) self.assertIsNotNone(doc.entities) self.assertIsNotNone(doc.relations)

Does that mean that the relations are not inside an entity? but globally per document?

Looking at HealthcareEntity here it seems like Length is also missing. If the idea is to add length in other PR, then ignore my comment

Are we not just using the len function (or whatever is natural for each language) for this? I hadn't actually planned for this work, so some extra context on why we are doing this would be helpful. Is it just for Healthcare or for everything?

because of the exposure of StringIndexType the user can specify other type that is not the default for Python. if Python can determine the len() based on the specific encoding then it is not needed.
This is not true for .NET so we do have to expose Length everywhere

Ok, thanks for the clarification. I might need to open a new PR for this one then, as it sounds like we might need to write a helper method to determine this. I will have to do some research to see what needs to be done.

As for your first question above, I haven't changed anything about how the relations are being represented, so right now it's just a list of relations with references to the entities objects in the entities array. I expect there to be a lot of change coming in the next API version so I haven't done the same thing that Java/JS are doing with graph building yet.

I think the work should be the same to what you do today to expose Offset. @iscai-msft for guidance on Python :)

relations

I see. if Python is ok with that, then we are all good. I wanted to call it out just in case

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_async_lro.py

iscai-msft · 2021-02-04T23:06:25Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

@@ -385,8 +385,8 @@ class HealthcareEntity(DictMixin):
        :ivar int offset: The Healthcare entity text offset from the start of the document.
        :ivar float confidence_score: Confidence score between 0 and 1 of the extracted
            entity.
-        :ivar links: A collection of entity references in known data sources.
-        :vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink]
+        :ivar data_sources: A collection of entity references in known data sources.


are you adding the related_entities attribute in this pr?

I hadn't planned to implement the same graph structure here because I'm not sure how it will be used, and we are also expecting big changes to the relations structure in the next release. I'd like to defer it for later until we have a chance to review in the context of the upcoming changes in the next API version.

For now I still have relations represented as they were previously.

ok, cc @mssfang @deyaaeldeen @maririos just for confirmation

Java, JS, and .NET are all resolving the relationships and adding them into the HealthcareEntity class like shown here.

If Python wants to wait I guess that is ok. For the UX study though, we need to have this decided and implemented.

@abhahn I think this is a fairly big divergence from other languages which I think should have been brought up in the scrum at least. I would suggest to reach out to @johanste and Mayuri to get approval first if you did not already.

If it will be a major issue I will do my best to have it implemented tomorrow, but it may not be finished until the end of the day. I am mainly concerned about making it in time for code complete, especially since there are timezone differences for reviewers.

No worries, If you need one person to review it on the weekend, I can review it.

@iscai-msft , @maririos , @mssfang , @deyaaeldeen , I've added related_entities to HealthcareEntity and removed relations from AnalyzeHealthcareEntitiesResultItem. It took less time than I thought it would. I still need to add tests, but have a look at the samples for now and let me know what you think.

Thanks @abhahn! Looks good!

…ashable; updated tests

iscai-msft · 2021-02-05T18:46:21Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

        if healthcare_result.relations:
            for r in healthcare_result.relations:
                _, source_idx = _get_indices(r.source)
                _, target_idx = _get_indices(r.target)
-                relations.append(HealthcareRelation._from_generated(r, entities[source_idx], entities[target_idx])) # pylint: disable=protected-access
+                if entities[source_idx] not in relation_map.keys():


can you move this into a separate function, it's a little hard to see what part is the specific related_entities building.

from what I can tell, you don't have a separate check for bidirectional, where you add both ways into the dict. Here's js's code that I'm looking at

Ok, I moved this to a classmethod and added the bidirectional case.

iscai-msft · 2021-02-05T18:46:41Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

+                if entities[source_idx] not in relation_map.keys():
+                    relation_map[entities[source_idx]] = {}
+
+                relation_map[entities[source_idx]][entities[target_idx]] = r.relation_type


nit: we call them dicts in python, not map

iscai-msft · 2021-02-05T19:38:07Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

        self.warnings = kwargs.get("warnings", [])
        self.statistics = kwargs.get("statistics", None)
        self.is_error = False

+    @classmethod
+    def _update_related_entities(cls, entities, relations_result):


nit, you can make this a static method, doesn't look like it needs cls or self. anywhere. feels weird having this as a class method

Oops, yeah, you're right. I updated this.

iscai-msft · 2021-02-05T20:21:38Z

sdk/textanalytics/azure-ai-textanalytics/README.md

@@ -224,7 +224,7 @@ The following section provides several code snippets covering some of the most c
 - [Recognize PII Entities](#recognize-pii-entities "Recognize pii entities")
 - [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
 - [Detect Language](#detect-language "Detect language")
- [Healthcare Analysis](#healthcare-analysis "Healthcare analysis")
+- [Healthcare Entities Analysis](#healthcare-analysis "Healthcare analysis")


I think this needs to be [Healthcare Entities Analysis](#healthcare-entities-analysis "Healthcare Entities Analysis") or the link won't work

iscai-msft · 2021-02-05T20:23:43Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_async_lro.py

+        await self._polling_method.update_status()
+
+        if self._polling_method.status() in terminal_states:
+            raise Warning("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \


@abhahn the more I think about it, the more I think we should not check the status, and just make the network call. We try to do as little client side validation as possible

iscai-msft · 2021-02-05T20:25:39Z

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py

-    :ivar links: A collection of entity references in known data sources.
-    :vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink]
+        :ivar data_sources: A collection of entity references in known data sources.
+        :vartype data_sources: list[~azure.ai.textanalytics.HealthcareEntityDataSource]


there seems to be incorrect indents in the docstirng

iscai-msft · 2021-02-05T20:26:55Z

...ytics/azure-ai-textanalytics/samples/sample_analyze_healthcare_entities_with_cancellation.py

+            cancellation_poller = poller.cancel()
+            cancellation_poller.wait()
+
+        except Warning as e:


i think to go with removing our client side validation, you can just remove this part of the sample

deyaaeldeen · 2021-02-05T21:32:09Z

sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md

+the service client to the poller object returned from `begin_analyze_healthcare_entities`.
+- Exposed Analyze Healthcare Entities operation metadata on the poller object returned from `begin_analyze_healthcare_entities`.
+- No longer need to specify `api_version=TextAnalyticsApiVersion.V3_1_PREVIEW_3` when calling `begin_analyze` and `begin_analyze_healthcare_entities`. `begin_analyze_healthcare_entities` is still in gated preview though.
+- Added a new parameter `string_index_type` to the service client methods `begin_analyze_healthcare_entities`, `analyze_sentiment`, `recognize_entities`, `recognize_pii_entities`, and `recognize_linked_entities`.


please explain the role of that new parameter.

Also please talk briefly about the new graph design.

Can you share a link to your changelog? I got up really early this morning to work on this PR and am close to the time where I will be logging off for the day. If you can share an example I will add it here, otherwise it will have to wait.

Sure, JS Changelog: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/textanalytics/ai-text-analytics/CHANGELOG.md#510-beta4-unreleased
here is the snippet: [Breaking] The healthcare entities returned by beginAnalyzeHealthcare are now organized as a directed graph where the edges represent a certain type of healthcare relationship between the source and target entities. Edges are stored in the relatedEntities property.

[Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (Azure#16247) * Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01 * Updates readme * Updates API version in new specs and examples * Adding zones property to resource * Add zones property to relavent examples * Running prettier on modified examples * Adding x-ms-secret to password in responses * Changing parameters to use common types * Reverting changes since they introduced validation errors * Prettier check

* CodeGen from PR 16247 in Azure/azure-rest-api-specs [Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (#16247) * Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01 * Updates readme * Updates API version in new specs and examples * Adding zones property to resource * Add zones property to relavent examples * Running prettier on modified examples * Adding x-ms-secret to password in responses * Changing parameters to use common types * Reverting changes since they introduced validation errors * Prettier check * version,CHANGELOG Co-authored-by: SDKAuto <sdkautomation@microsoft.com> Co-authored-by: PythonSdkPipelines <PythonSdkPipelines>

Abigail Hartman added 2 commits January 14, 2021 12:13

Renamed healthcare methods

c00140c

Exposing job metadata on the healthcare polling operation

e69151c

abhahn requested review from iscai-msft and kristapratico as code owners January 20, 2021 03:08

ghost added the Cognitive - Text Analytics label Jan 20, 2021

Aligning with other similar work for Analyze

d005b70

Abby Hartman added 2 commits February 1, 2021 16:57

Fixed merge conflicts

ac9faf0

Regenerated with custom pollers; updated tests and a few other files

078e875

abhahn requested review from maririos, deyaaeldeen and mssfang February 2, 2021 23:01

abhahn linked an issue Feb 2, 2021 that may be closed by this pull request

[text analytics] healthcare endpoint changes #16133

Closed

4 tasks

Abby Hartman added 6 commits February 3, 2021 13:29

Renamed HealthcareEntityLink -> HealthcareEntityDataSource and also r…

c2a09ff

…enamed properties of that class; updated samples

Fixed missed renames in __init__.py

fb37651

Updated links and other things in the readme

f827c6b

Updated samples readme

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

864b1b2

Merge branch 'master' of https://github.com/Azure/azure-sdk-for-python …

2649001

…into abhahn/ta-v3.1-preview.4-healthcare-updates

Reverted an unintentional change in analyze_sentiment

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

eed2d38

abhahn commented Feb 3, 2021

View reviewed changes

sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py Show resolved Hide resolved

maririos reviewed Feb 3, 2021

View reviewed changes

Abby Hartman added 2 commits February 3, 2021 16:18

Moved cancellation to the poller; fixed some linting issues

429c4ea

Removed an unused model

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

ef6c210

iscai-msft reviewed Feb 4, 2021

View reviewed changes

Abby Hartman added 4 commits February 4, 2021 15:17

PR comments and updated changelog

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

91ba6a1

Added related_entities; Fixed merge conflicts

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

a78decd

Fixed a few issues caused by merge conflicts; made HealthcareEntity h…

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

7da7797

…ashable; updated tests

Fixed a problem with cancellation; updated samples

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

0a14704

iscai-msft reviewed Feb 5, 2021

View reviewed changes

Abby Hartman added 3 commits February 5, 2021 10:50

Merge conflicts

d952ded

PR comments

82bc4e4

PR comments

03d68a7

iscai-msft approved these changes Feb 5, 2021

View reviewed changes

Abby Hartman added 4 commits February 5, 2021 11:49

Fixed bidirectional check; updated/added tests

00c00a5

added a test for async

f2dedef

PR comments and fixing test issues

bae8200

Fixed a linting issue

4563059

iscai-msft approved these changes Feb 5, 2021

View reviewed changes

Abby Hartman added 2 commits February 5, 2021 12:44

PR comments and fixed a test issue

f9e5776

Fixed cancellation issue

3745666

deyaaeldeen reviewed Feb 5, 2021

View reviewed changes

Abby Hartman added 3 commits February 5, 2021 13:33

Another cancellation test fix

0971671

PR comments

7771a70

Async syntax issue

acc58ed

iscai-msft merged commit 827bc5f into Azure:master Feb 5, 2021

[text analytics] Updates to Healthcare design #16247

[text analytics] Updates to Healthcare design #16247

Conversation

abhahn commented Jan 20, 2021

check-enforcer bot commented Feb 2, 2021

What is Check Enforcer?

Why am I getting this message?

What should I do now?

What if I am onboarding a new service?

abhahn commented Feb 2, 2021

abhahn commented Feb 3, 2021

maririos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment