Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] Updates to Healthcare design #16247

Merged

Conversation

abhahn
Copy link
Member

@abhahn abhahn commented Jan 20, 2021

This PR contains design updates for the Healthcare SDK based on direct feedback on the original Python design and also aligning with the updated design in other SDK languages (Java and JS).

@check-enforcer
Copy link

check-enforcer bot commented Feb 2, 2021

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run python - [service] - ci

@abhahn
Copy link
Member Author

abhahn commented Feb 2, 2021

/check-enforcer reset

@abhahn abhahn linked an issue Feb 2, 2021 that may be closed by this pull request
4 tasks
@abhahn
Copy link
Member Author

abhahn commented Feb 3, 2021

/check-enforcer override

Abby Hartman added 6 commits February 3, 2021 13:29
…enamed properties of that class; updated samples

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Copy link
Member

@maririos maririos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on having a changelog added in another PR? or should it be included here too

analyze_healthcare_entities_poller._polling_method.update_status()

if analyze_healthcare_entities_poller._polling_method.status() in terminal_states:
print("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be raised as an error? or does Python knows how to mask this print statement into something the user can see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an error would be better. It's a little bit difficult to test every code path here since most of the time the jobs succeed quickly enough that this always happens. I could try adding larger documents to the request to see if i can get the job to run long enough to test actual cancellation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to raise the exception type Warning in the new cancellation method on the poller. I thought this type made the most sense of all the built-in exceptions, but I can change it to something else if anyone has any suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm idk what the behavior is in Python for this scenarios. @iscai-msft for context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for python we can log warnings, usually they look like this, not Warning raises. I think if other languages are raising an error when trying to cancel a finished operation, we might as well just raise an error too though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the inspiration for this from here. If there is a better exception type to use, could you make a suggestion?

poller = await text_analytics_client.begin_analyze_healthcare(documents)
poller = await text_analytics_client.begin_cancel_analyze_healthcare(poller)
poller = await text_analytics_client.begin_analyze_healthcare_entities(documents)
poller = await text_analytics_client.begin_cancel_analyze_healthcare_entities(poller)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if the option of having the cancellation be part of the poller was contemplated. This is what other languages are doing. Not sure if can be done/make sense in Python

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a cleaner design, and would be easy to move the cancellation logic from the client method to a method on the poller. Let me try it so I can get feedback on the update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maririos , @iscai-msft , I moved cancellation code to the poller to try it out. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the sample needs to be updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

begin_cancel_analyze_healthcare_entities a public API that user can see?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mssfang , I still need to update the sample to reflect a recent change, but originally this was how it was designed. I moved the logic to the poller in a recent commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample has been updated with new cancellation code.

@@ -42,7 +42,8 @@ def test_all_successful_passing_dict(self, client):
docs = [{"id": "1", "language": "en", "text": "Patient does not suffer from high blood pressure."},
{"id": "2", "language": "en", "text": "Prescribed 100mg ibuprofen, taken twice daily."}]

response = client.begin_analyze_healthcare(docs, show_stats=True, polling_interval=self._interval()).result()
poller = client.begin_analyze_healthcare_entities(docs, show_stats=True, polling_interval=self._interval())
response = poller.result()

self.assertIsNotNone(response.statistics)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at

for doc in response:
            self.assertIsNotNone(doc.id)
            self.assertIsNotNone(doc.statistics)
            self.assertIsNotNone(doc.entities)
            self.assertIsNotNone(doc.relations)

Does that mean that the relations are not inside an entity? but globally per document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at HealthcareEntity here it seems like Length is also missing. If the idea is to add length in other PR, then ignore my comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not just using the len function (or whatever is natural for each language) for this? I hadn't actually planned for this work, so some extra context on why we are doing this would be helpful. Is it just for Healthcare or for everything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the exposure of StringIndexType the user can specify other type that is not the default for Python. if Python can determine the len() based on the specific encoding then it is not needed.
This is not true for .NET so we do have to expose Length everywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the clarification. I might need to open a new PR for this one then, as it sounds like we might need to write a helper method to determine this. I will have to do some research to see what needs to be done.

As for your first question above, I haven't changed anything about how the relations are being represented, so right now it's just a list of relations with references to the entities objects in the entities array. I expect there to be a lot of change coming in the next API version so I haven't done the same thing that Java/JS are doing with graph building yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the work should be the same to what you do today to expose Offset. @iscai-msft for guidance on Python :)

relations

I see. if Python is ok with that, then we are all good. I wanted to call it out just in case

Abby Hartman added 2 commits February 3, 2021 16:18

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@@ -385,8 +385,8 @@ class HealthcareEntity(DictMixin):
:ivar int offset: The Healthcare entity text offset from the start of the document.
:ivar float confidence_score: Confidence score between 0 and 1 of the extracted
entity.
:ivar links: A collection of entity references in known data sources.
:vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink]
:ivar data_sources: A collection of entity references in known data sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you adding the related_entities attribute in this pr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't planned to implement the same graph structure here because I'm not sure how it will be used, and we are also expecting big changes to the relations structure in the next release. I'd like to defer it for later until we have a chance to review in the context of the upcoming changes in the next API version.

For now I still have relations represented as they were previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cc @mssfang @deyaaeldeen @maririos just for confirmation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java, JS, and .NET are all resolving the relationships and adding them into the HealthcareEntity class like shown here.

If Python wants to wait I guess that is ok. For the UX study though, we need to have this decided and implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhahn I think this is a fairly big divergence from other languages which I think should have been brought up in the scrum at least. I would suggest to reach out to @johanste and Mayuri to get approval first if you did not already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it will be a major issue I will do my best to have it implemented tomorrow, but it may not be finished until the end of the day. I am mainly concerned about making it in time for code complete, especially since there are timezone differences for reviewers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, If you need one person to review it on the weekend, I can review it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iscai-msft , @maririos , @mssfang , @deyaaeldeen , I've added related_entities to HealthcareEntity and removed relations from AnalyzeHealthcareEntitiesResultItem. It took less time than I thought it would. I still need to add tests, but have a look at the samples for now and let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abhahn! Looks good!

Abby Hartman added 4 commits February 4, 2021 15:17

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…ashable; updated tests

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
if healthcare_result.relations:
for r in healthcare_result.relations:
_, source_idx = _get_indices(r.source)
_, target_idx = _get_indices(r.target)
relations.append(HealthcareRelation._from_generated(r, entities[source_idx], entities[target_idx])) # pylint: disable=protected-access
if entities[source_idx] not in relation_map.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this into a separate function, it's a little hard to see what part is the specific related_entities building.

from what I can tell, you don't have a separate check for bidirectional, where you add both ways into the dict. Here's js's code that I'm looking at

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I moved this to a classmethod and added the bidirectional case.

if entities[source_idx] not in relation_map.keys():
relation_map[entities[source_idx]] = {}

relation_map[entities[source_idx]][entities[target_idx]] = r.relation_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we call them dicts in python, not map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

self.warnings = kwargs.get("warnings", [])
self.statistics = kwargs.get("statistics", None)
self.is_error = False

@classmethod
def _update_related_entities(cls, entities, relations_result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, you can make this a static method, doesn't look like it needs cls or self. anywhere. feels weird having this as a class method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah, you're right. I updated this.

@@ -224,7 +224,7 @@ The following section provides several code snippets covering some of the most c
- [Recognize PII Entities](#recognize-pii-entities "Recognize pii entities")
- [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
- [Detect Language](#detect-language "Detect language")
- [Healthcare Analysis](#healthcare-analysis "Healthcare analysis")
- [Healthcare Entities Analysis](#healthcare-analysis "Healthcare analysis")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be [Healthcare Entities Analysis](#healthcare-entities-analysis "Healthcare Entities Analysis") or the link won't work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

await self._polling_method.update_status()

if self._polling_method.status() in terminal_states:
raise Warning("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhahn the more I think about it, the more I think we should not check the status, and just make the network call. We try to do as little client side validation as possible

:ivar links: A collection of entity references in known data sources.
:vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink]
:ivar data_sources: A collection of entity references in known data sources.
:vartype data_sources: list[~azure.ai.textanalytics.HealthcareEntityDataSource]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seems to be incorrect indents in the docstirng

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

cancellation_poller = poller.cancel()
cancellation_poller.wait()

except Warning as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think to go with removing our client side validation, you can just remove this part of the sample

the service client to the poller object returned from `begin_analyze_healthcare_entities`.
- Exposed Analyze Healthcare Entities operation metadata on the poller object returned from `begin_analyze_healthcare_entities`.
- No longer need to specify `api_version=TextAnalyticsApiVersion.V3_1_PREVIEW_3` when calling `begin_analyze` and `begin_analyze_healthcare_entities`. `begin_analyze_healthcare_entities` is still in gated preview though.
- Added a new parameter `string_index_type` to the service client methods `begin_analyze_healthcare_entities`, `analyze_sentiment`, `recognize_entities`, `recognize_pii_entities`, and `recognize_linked_entities`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain the role of that new parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please talk briefly about the new graph design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share a link to your changelog? I got up really early this morning to work on this PR and am close to the time where I will be logging off for the day. If you can share an example I will add it here, otherwise it will have to wait.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, JS Changelog: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/textanalytics/ai-text-analytics/CHANGELOG.md#510-beta4-unreleased
here is the snippet: [Breaking] The healthcare entities returned by beginAnalyzeHealthcare are now organized as a directed graph where the edges represent a certain type of healthcare relationship between the source and target entities. Edges are stored in the relatedEntities property.

@iscai-msft iscai-msft merged commit 827bc5f into Azure:master Feb 5, 2021
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this pull request Oct 12, 2021
[Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (Azure#16247)

* Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01

* Updates readme

* Updates API version in new specs and examples

* Adding zones property to resource

* Add zones property to relavent examples

* Running prettier on modified examples

* Adding x-ms-secret to password in responses

* Changing parameters to use common types

* Reverting changes since they introduced validation errors

* Prettier check
msyyc pushed a commit that referenced this pull request Oct 14, 2021
* CodeGen from PR 16247 in Azure/azure-rest-api-specs
[Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (#16247)

* Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01

* Updates readme

* Updates API version in new specs and examples

* Adding zones property to resource

* Add zones property to relavent examples

* Running prettier on modified examples

* Adding x-ms-secret to password in responses

* Changing parameters to use common types

* Reverting changes since they introduced validation errors

* Prettier check

* version,CHANGELOG

Co-authored-by: SDKAuto <sdkautomation@microsoft.com>
Co-authored-by: PythonSdkPipelines <PythonSdkPipelines>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[text analytics] healthcare endpoint changes
5 participants