-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[text analytics] Updates to Healthcare design #16247
[text analytics] Updates to Healthcare design #16247
Conversation
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: What if I am onboarding a new service?Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment: |
/check-enforcer reset |
/check-enforcer override |
…enamed properties of that class; updated samples
…into abhahn/ta-v3.1-preview.4-healthcare-updates
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on having a changelog added in another PR? or should it be included here too
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_generated/_operations_mixin.py
Show resolved
Hide resolved
analyze_healthcare_entities_poller._polling_method.update_status() | ||
|
||
if analyze_healthcare_entities_poller._polling_method.status() in terminal_states: | ||
print("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be raised as an error? or does Python knows how to mask this print statement into something the user can see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an error would be better. It's a little bit difficult to test every code path here since most of the time the jobs succeed quickly enough that this always happens. I could try adding larger documents to the request to see if i can get the job to run long enough to test actual cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to raise the exception type Warning
in the new cancellation method on the poller. I thought this type made the most sense of all the built-in exceptions, but I can change it to something else if anyone has any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm idk what the behavior is in Python for this scenarios. @iscai-msft for context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for python we can log warnings, usually they look like this, not Warning raises. I think if other languages are raising an error when trying to cancel a finished operation, we might as well just raise an error too though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the inspiration for this from here. If there is a better exception type to use, could you make a suggestion?
poller = await text_analytics_client.begin_analyze_healthcare(documents) | ||
poller = await text_analytics_client.begin_cancel_analyze_healthcare(poller) | ||
poller = await text_analytics_client.begin_analyze_healthcare_entities(documents) | ||
poller = await text_analytics_client.begin_cancel_analyze_healthcare_entities(poller) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if the option of having the cancellation be part of the poller was contemplated. This is what other languages are doing. Not sure if can be done/make sense in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a cleaner design, and would be easy to move the cancellation logic from the client method to a method on the poller. Let me try it so I can get feedback on the update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maririos , @iscai-msft , I moved cancellation code to the poller to try it out. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the sample needs to be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
begin_cancel_analyze_healthcare_entities
a public API that user can see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mssfang , I still need to update the sample to reflect a recent change, but originally this was how it was designed. I moved the logic to the poller in a recent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample has been updated with new cancellation code.
@@ -42,7 +42,8 @@ def test_all_successful_passing_dict(self, client): | |||
docs = [{"id": "1", "language": "en", "text": "Patient does not suffer from high blood pressure."}, | |||
{"id": "2", "language": "en", "text": "Prescribed 100mg ibuprofen, taken twice daily."}] | |||
|
|||
response = client.begin_analyze_healthcare(docs, show_stats=True, polling_interval=self._interval()).result() | |||
poller = client.begin_analyze_healthcare_entities(docs, show_stats=True, polling_interval=self._interval()) | |||
response = poller.result() | |||
|
|||
self.assertIsNotNone(response.statistics) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at
for doc in response:
self.assertIsNotNone(doc.id)
self.assertIsNotNone(doc.statistics)
self.assertIsNotNone(doc.entities)
self.assertIsNotNone(doc.relations)
Does that mean that the relations are not inside an entity? but globally per document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at HealthcareEntity
here it seems like Length
is also missing. If the idea is to add length
in other PR, then ignore my comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not just using the len
function (or whatever is natural for each language) for this? I hadn't actually planned for this work, so some extra context on why we are doing this would be helpful. Is it just for Healthcare or for everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of the exposure of StringIndexType the user can specify other type that is not the default for Python. if Python can determine the len() based on the specific encoding then it is not needed.
This is not true for .NET so we do have to expose Length
everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the clarification. I might need to open a new PR for this one then, as it sounds like we might need to write a helper method to determine this. I will have to do some research to see what needs to be done.
As for your first question above, I haven't changed anything about how the relations are being represented, so right now it's just a list of relations with references to the entities objects in the entities
array. I expect there to be a lot of change coming in the next API version so I haven't done the same thing that Java/JS are doing with graph building yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the work should be the same to what you do today to expose Offset. @iscai-msft for guidance on Python :)
relations
I see. if Python is ok with that, then we are all good. I wanted to call it out just in case
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py
Outdated
Show resolved
Hide resolved
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py
Outdated
Show resolved
Hide resolved
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py
Outdated
Show resolved
Hide resolved
sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_async_lro.py
Outdated
Show resolved
Hide resolved
@@ -385,8 +385,8 @@ class HealthcareEntity(DictMixin): | |||
:ivar int offset: The Healthcare entity text offset from the start of the document. | |||
:ivar float confidence_score: Confidence score between 0 and 1 of the extracted | |||
entity. | |||
:ivar links: A collection of entity references in known data sources. | |||
:vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink] | |||
:ivar data_sources: A collection of entity references in known data sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you adding the related_entities
attribute in this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't planned to implement the same graph structure here because I'm not sure how it will be used, and we are also expecting big changes to the relations structure in the next release. I'd like to defer it for later until we have a chance to review in the context of the upcoming changes in the next API version.
For now I still have relations
represented as they were previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cc @mssfang @deyaaeldeen @maririos just for confirmation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java, JS, and .NET are all resolving the relationships and adding them into the HealthcareEntity class like shown here.
If Python wants to wait I guess that is ok. For the UX study though, we need to have this decided and implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it will be a major issue I will do my best to have it implemented tomorrow, but it may not be finished until the end of the day. I am mainly concerned about making it in time for code complete, especially since there are timezone differences for reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, If you need one person to review it on the weekend, I can review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iscai-msft , @maririos , @mssfang , @deyaaeldeen , I've added related_entities
to HealthcareEntity
and removed relations
from AnalyzeHealthcareEntitiesResultItem
. It took less time than I thought it would. I still need to add tests, but have a look at the samples for now and let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abhahn! Looks good!
if healthcare_result.relations: | ||
for r in healthcare_result.relations: | ||
_, source_idx = _get_indices(r.source) | ||
_, target_idx = _get_indices(r.target) | ||
relations.append(HealthcareRelation._from_generated(r, entities[source_idx], entities[target_idx])) # pylint: disable=protected-access | ||
if entities[source_idx] not in relation_map.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this into a separate function, it's a little hard to see what part is the specific related_entities
building.
from what I can tell, you don't have a separate check for bidirectional, where you add both ways into the dict. Here's js's code that I'm looking at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I moved this to a classmethod and added the bidirectional case.
if entities[source_idx] not in relation_map.keys(): | ||
relation_map[entities[source_idx]] = {} | ||
|
||
relation_map[entities[source_idx]][entities[target_idx]] = r.relation_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we call them dict
s in python, not map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
self.warnings = kwargs.get("warnings", []) | ||
self.statistics = kwargs.get("statistics", None) | ||
self.is_error = False | ||
|
||
@classmethod | ||
def _update_related_entities(cls, entities, relations_result): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, you can make this a static method, doesn't look like it needs cls or self. anywhere. feels weird having this as a class method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah, you're right. I updated this.
@@ -224,7 +224,7 @@ The following section provides several code snippets covering some of the most c | |||
- [Recognize PII Entities](#recognize-pii-entities "Recognize pii entities") | |||
- [Extract Key Phrases](#extract-key-phrases "Extract key phrases") | |||
- [Detect Language](#detect-language "Detect language") | |||
- [Healthcare Analysis](#healthcare-analysis "Healthcare analysis") | |||
- [Healthcare Entities Analysis](#healthcare-analysis "Healthcare analysis") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be [Healthcare Entities Analysis](#healthcare-entities-analysis "Healthcare Entities Analysis")
or the link won't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
await self._polling_method.update_status() | ||
|
||
if self._polling_method.status() in terminal_states: | ||
raise Warning("Operation with ID '%s' is already in a terminal state and cannot be cancelled." \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhahn the more I think about it, the more I think we should not check the status, and just make the network call. We try to do as little client side validation as possible
:ivar links: A collection of entity references in known data sources. | ||
:vartype links: list[~azure.ai.textanalytics.HealthcareEntityLink] | ||
:ivar data_sources: A collection of entity references in known data sources. | ||
:vartype data_sources: list[~azure.ai.textanalytics.HealthcareEntityDataSource] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be incorrect indents in the docstirng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
cancellation_poller = poller.cancel() | ||
cancellation_poller.wait() | ||
|
||
except Warning as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think to go with removing our client side validation, you can just remove this part of the sample
the service client to the poller object returned from `begin_analyze_healthcare_entities`. | ||
- Exposed Analyze Healthcare Entities operation metadata on the poller object returned from `begin_analyze_healthcare_entities`. | ||
- No longer need to specify `api_version=TextAnalyticsApiVersion.V3_1_PREVIEW_3` when calling `begin_analyze` and `begin_analyze_healthcare_entities`. `begin_analyze_healthcare_entities` is still in gated preview though. | ||
- Added a new parameter `string_index_type` to the service client methods `begin_analyze_healthcare_entities`, `analyze_sentiment`, `recognize_entities`, `recognize_pii_entities`, and `recognize_linked_entities`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain the role of that new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please talk briefly about the new graph design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share a link to your changelog? I got up really early this morning to work on this PR and am close to the time where I will be logging off for the day. If you can share an example I will add it here, otherwise it will have to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, JS Changelog: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/textanalytics/ai-text-analytics/CHANGELOG.md#510-beta4-unreleased
here is the snippet: [Breaking] The healthcare entities returned by beginAnalyzeHealthcare are now organized as a directed graph where the edges represent a certain type of healthcare relationship between the source and target entities. Edges are stored in the relatedEntities property.
[Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (Azure#16247) * Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01 * Updates readme * Updates API version in new specs and examples * Adding zones property to resource * Add zones property to relavent examples * Running prettier on modified examples * Adding x-ms-secret to password in responses * Changing parameters to use common types * Reverting changes since they introduced validation errors * Prettier check
* CodeGen from PR 16247 in Azure/azure-rest-api-specs [Hub Generated] Review request for Microsoft.ContainerInstance to add version stable/2021-09-01 (#16247) * Adds base for updating Microsoft.ContainerInstance from version stable/2021-07-01 to version 2021-09-01 * Updates readme * Updates API version in new specs and examples * Adding zones property to resource * Add zones property to relavent examples * Running prettier on modified examples * Adding x-ms-secret to password in responses * Changing parameters to use common types * Reverting changes since they introduced validation errors * Prettier check * version,CHANGELOG Co-authored-by: SDKAuto <sdkautomation@microsoft.com> Co-authored-by: PythonSdkPipelines <PythonSdkPipelines>
This PR contains design updates for the Healthcare SDK based on direct feedback on the original Python design and also aligning with the updated design in other SDK languages (Java and JS).