Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] add string-index-type support #13378

Merged
merged 8 commits into from
Aug 28, 2020

Conversation

iscai-msft
Copy link
Contributor

@iscai-msft iscai-msft commented Aug 27, 2020

Defaults to Uniode code points for Python

Following talks with @johanste python is going to at least start out not exposing the string-index-type param to users

fixes #12032

deyaaeldeen
deyaaeldeen previously approved these changes Aug 27, 2020
Copy link
Member

@deyaaeldeen deyaaeldeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!, left a few minor comments about formatting. Are we sure we can not come with a better name for the parameter?

kristapratico
kristapratico previously approved these changes Aug 27, 2020
Copy link
Member

@kristapratico kristapratico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iscai-msft
Copy link
Contributor Author

@deyaaeldeen in Python it doesn't really matter if we leave a trailing comma during imports, and for arrays taht span multiple lines it's actualy enouraged to leave a trailign comma. Which parameter are you saying we should come up with a better name for?

@deyaaeldeen
Copy link
Member

@iscai-msft string-index-type. I think what it really specifies is which unit should we use to measure the length of the text with so type of index does not feel right to me. Perhaps string-code-unit sounds better?

@iscai-msft
Copy link
Contributor Author

@deyaaeldeen this parameter isn't exposed to users at all for python, so naming doesn't matter since it's just for us. You should sync with @willmtemple to see if JS wants to expose string-index-type to users (at least initially)

@maririos
Copy link
Member

@deyaaeldeen this parameter isn't exposed to users at all for python, so naming doesn't matter since it's just for us.

I would say it matters for whoever is maintaining the library, so even if we don't expose it, we should consider a name that indicates what the variable is storing in place.
I agree with @deyaaeldeen that string-code-unit is better than string-index-type

@deyaaeldeen
Copy link
Member

@iscai-msft I think code readability is also important and coming up with good names is generally a good thing. It is a nitpick though so I am happy with whatever call you make on this.

result = client.recognize_pii_entities(["👩 SSN: 859-98-0987"])
self.assertEqual(result[0].entities[0].offset, 7)
self.assertEqual(result[0].entities[0].length, 11)

@GlobalTextAnalyticsAccountPreparer()
@TextAnalyticsClientPreparer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests are not specifically for PII. Looks like Python has files per group of tests, so consider moving this to a specific file and test it for different endpoints too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this into another test file for offset and length tests

maririos
maririos previously approved these changes Aug 27, 2020
@maririos
Copy link
Member

With this change, I think it is worth adding in the readme, and in the description of Length and Offset that the encoding used is UnicodeCodePoint

@maririos maririos self-requested a review August 27, 2020 23:54
mssfang
mssfang previously approved these changes Aug 28, 2020
@iscai-msft
Copy link
Contributor Author

iscai-msft commented Aug 28, 2020

@maririos @deyaaeldeen I really don't have strong feelings, so I'll just change the internal name to string_code_unit. I don't think the encoding is worth mentioning in the readme, but I'll add it to the docstrings

@iscai-msft iscai-msft merged commit 3891c08 into Azure:master Aug 28, 2020
@iscai-msft iscai-msft deleted the string_index_type branch August 28, 2020 17:33
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Aug 28, 2020
…into return_none_for_offset_length_v3

* 'master' of https://github.com/Azure/azure-sdk-for-python:
  [text analytics] add string-index-type support (Azure#13378)
  [text analytics] fix error response if pii entities is called from v3.0 client (Azure#13383)
  Send spec (Azure#13143)
  Anomaly Detector 3.0.0b2 release (Azure#13351)
  azure-keyvault-administration generated code (Azure#12098)
  fixed bug in querying by page using continuation token (Azure#13298)
iscai-msft added a commit to iscai-msft/azure-sdk-for-python that referenced this pull request Sep 1, 2020
…into expose_parse_vault_id

* 'master' of https://github.com/Azure/azure-sdk-for-python: (37 commits)
  [text analytics] add versionadded sphinx documentation (Azure#13450)
  [text analytics] add bing_id property to LinkedEntity class (Azure#13446)
  fix typing for paging methods (Azure#13410)
  [text analytics] add domain_filter param (Azure#13451)
  fix issue Azure#11658 for is_valid_resource_id (Azure#11709)
  added create_table_if_not_exists method to table service client (Azure#13385)
  [ServiceBus] Test and failure improvements (Azure#13345)
  Proper encoding and decoding of source URLs - Fixes special characters in source URL issue (Azure#13275)
  Switch retry (Azure#13264)
  [ServiceBus] ServiceBusClient close spawned children (Azure#13077)
  fixing version issue by not overwriting the version with the semantic… (Azure#13411)
  clean up reference and tests (Azure#13412)
  Consistent returns (Azure#13245)
  [text analytics] return None for offset and length for v3.0 (Azure#13382)
  edit all authentication files and add a test (Azure#13355)
  Add managed_identity_client_id argument to DefaultAzureCredential (Azure#13218)
  [text analytics] add string-index-type support (Azure#13378)
  [text analytics] fix error response if pii entities is called from v3.0 client (Azure#13383)
  Send spec (Azure#13143)
  Anomaly Detector 3.0.0b2 release (Azure#13351)
  ...
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this pull request Mar 12, 2021
[Hub Generated] Review request for Microsoft.Consumption to add version stable/2019-10-01 (Azure#13378)

* Added swagger updates for the optional forecastSpend JSON field in Budget API

* Review feedback

* Internal review update
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-python that referenced this pull request Mar 12, 2021
[Hub Generated] Review request for Microsoft.Consumption to add version stable/2019-10-01 (Azure#13378)

* Added swagger updates for the optional forecastSpend JSON field in Budget API

* Review feedback

* Internal review update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[text analytics] add support for optional string-index-type query param
5 participants