Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing errors in blob storage documentation code #24661

Closed
TobiRoby opened this issue Jun 1, 2022 · 18 comments
Closed

Typing errors in blob storage documentation code #24661

TobiRoby opened this issue Jun 1, 2022 · 18 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Milestone

Comments

@TobiRoby
Copy link

TobiRoby commented Jun 1, 2022

Hi,

I am receiving typing errors from mypy when following the blob storage examples in the official documentation.

It would be awesome, if the documentation can provide an example with no type errors :)

This is the code I am type checking (copy & pasted from the linked documentation):

import os
import uuid

from azure.storage.blob import BlobServiceClient

# Create the BlobServiceClient object which will be used to create a container client
blob_service_client: BlobServiceClient = BlobServiceClient.from_connection_string(
    "connect_str"
)
# Create a unique name for the container
container_name = str(uuid.uuid4())

# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)

# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".txt"
upload_file_path = os.path.join(local_path, local_file_name)

# Write text to the file
file = open(upload_file_path, "w")
file.write("Hello, World!")
file.close()

# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(
    container=container_name, blob=local_file_name
)

print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)

# Upload the created file
with open(upload_file_path, "rb") as data:
    blob_client.upload_blob(data)

download_file_path = os.path.join(
    local_path, str.replace(local_file_name, ".txt", "DOWNLOAD.txt")
)
print("\nDownloading blob to \n\t" + download_file_path)

with open(download_file_path, "wb") as download_file:
    download_file.write(blob_client.download_blob().readall())

I receive 2 errors when applying mypy:

test.py:35: error: Argument 1 to "upload_blob" of "BlobClient" has incompatible type "BufferedReader"; expected "Iterable[str]"
test.py:35: note: Following member(s) of "BufferedReader" have conflicts:
test.py:35: note:     Expected:
test.py:35: note:         def __iter__(self) -> Iterator[str]
test.py:35: note:     Got:
test.py:35: note:         def __iter__(self) -> Iterator[bytes]
test.py:43: error: Argument 1 to "write" of "BufferedWriter" has incompatible type "Union[bytes, str]"; expected "Union[bytes, Union[bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]]"
Found 2 errors in 1 file (checked 1 source file)

environment
python: 3.10.4
azure-storage-blob: 12.12.0
mypy: 0.960

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 1, 2022
@azure-sdk azure-sdk added Client This issue points to a problem in the data-plane of the library. needs-team-triage Workflow: This issue needs the team to triage. Storage Storage Service (Queues, Blobs, Files) labels Jun 1, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jun 1, 2022
@TobiRoby TobiRoby changed the title Typing errors in documentation code Typing errors in blob storage documentation code Jun 1, 2022
@catalinaperalta catalinaperalta added bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-team-triage Workflow: This issue needs the team to triage. labels Jun 1, 2022
@catalinaperalta
Copy link
Member

Thanks for reaching out @TobiRoby! We'll investigate and get back to you asap.

@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jun 1, 2022
@ghost
Copy link

ghost commented Jun 1, 2022

Thank you for your feedback. This has been routed to the support team for assistance.

@catalinaperalta catalinaperalta added this to the [2022] July milestone Jun 1, 2022
@vincenttran-msft vincenttran-msft self-assigned this Jun 1, 2022
@dimbleby
Copy link
Member

I think this is a mypy bug, specifically python/mypy#3644

I reckon it can be worked around by changing the type so that instead of

# type: Union[AnyStr, Iterable[AnyStr], IO[AnyStr]]

it goes

# type: Union[bytes, str, Iterable[AnyStr], IO[AnyStr]]

@jalauzon-msft
Copy link
Member

Hi @TobiRoby, for your second error Argument 1 to "write" of "BufferedWriter" has incompatible type "Union[bytes, str]", this is an error in the built-in write method due to the return type of our readall. Our readall can return an str if you specify an encoding in your call to download_blob so that is expected. You may be able to tell mypy that you expected a bytes back using a type hint. Something like this may work:

with open(download_file_path, "wb") as download_file:
    content: bytes = blob_client.download_blob().readall()
    download_file.write(content)

For the first error Argument 1 to "upload_blob" of "BlobClient" has incompatible type "BufferedReader", it looks like @dimbleby may be correct and we could consider an adjustment there.

@annatisch Anna, curious what you think about adjusting the type for upload_blob similar to the above suggestion from David?

@dimbleby
Copy link
Member

dimbleby commented Jun 13, 2022

@jalauzon-msft the advice you are offering should be applied to your own code: this isn't the user's code it is

copy & pasted from the linked documentation

Edit: or, perhaps you could get more sophisticated with the annotations, I think you should be able to @overload so that readall() returns different types according to whether encoding is None or is a str. Compare the tricks that typeshed plays with open() on paths, for instance: it can return a FileIO, or a BufferedReader, or ...

Then users wouldn't have to jump through such hoops.

@dimbleby
Copy link
Member

One more thought, perhaps the most important one really: wouldn't it be nice if the workflows for this repository checked the type annotations, so that such things were caught before they were shipped?

@TobiRoby
Copy link
Author

Thanks for your feedback.

  1. First error Argument 1 to "upload_blob" of "BlobClient" has incompatible type "BufferedReader":
    I can confirm that your suggestion for the first error resolves the first typing error.
with open(download_file_path, "wb") as download_file:
    content: bytes = blob_client.download_blob().readall()
    download_file.write(content)

@jalauzon-msft can you add it to the official documentation, please?

  1. Second error Argument 1 to "write" of "BufferedWriter" has incompatible type "Union[bytes, str]"; expected "Union[bytes, Union[bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]]":
    To be 100% sure: for which function did you apply the typing change @dimbleby ?
    And if that works is it a mypy error as you initially suggested or a wrong type declaration in the azure-sdk-for-pyhton project?

@TobiRoby
Copy link
Author

One more thought, perhaps the most important one really: wouldn't it be nice if the workflows for this repository checked the type annotations, so that such things were caught before they were shipped?

I really like your suggestion.
I try to use pre-commit in combination with mypy/flake8/... for consistency purposes in every project.
@jalauzon-msft if you are interested in these topics let us know

@jalauzon-msft
Copy link
Member

Hi @dimbleby and @TobiRoby, thanks for the follow-ups! Sorry, I totally missed this was coming from our documented sample so that's "egg on my face". We can consider updating the sample, but I think it would be better to fix the typing as David mentioned. It's just a bit tricky and likely @overload will not work in this case because the encoding parameter is actually passed to download_blob and not to readinto itself. readinto is a method on StorageStreamDownloader which is returned by download_blob (in all cases). We will look more into how to address this.

For the other issue, seeing how that MyPy bug has existed for some time, we can make the change David suggested (or something similar) assuming that will work.

And a note on a type checking workflow. We will be working on that sometime in the near future. This repository already supports MyPy type checking with our internal pipelines, we just need to get Storage included in the checks.

@PramodValavala-MSFT PramodValavala-MSFT added Service Attention Workflow: This issue is responsible by Azure service team. and removed CXP Attention labels Jun 24, 2022
@ghost
Copy link

ghost commented Jun 24, 2022

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details

Hi,

I am receiving typing errors from mypy when following the blob storage examples in the official documentation.

It would be awesome, if the documentation can provide an example with no type errors :)

This is the code I am type checking (copy & pasted from the linked documentation):

import os
import uuid

from azure.storage.blob import BlobServiceClient

# Create the BlobServiceClient object which will be used to create a container client
blob_service_client: BlobServiceClient = BlobServiceClient.from_connection_string(
    "connect_str"
)
# Create a unique name for the container
container_name = str(uuid.uuid4())

# Create a local directory to hold blob data
local_path = "./data"
os.mkdir(local_path)

# Create a file in the local data directory to upload and download
local_file_name = str(uuid.uuid4()) + ".txt"
upload_file_path = os.path.join(local_path, local_file_name)

# Write text to the file
file = open(upload_file_path, "w")
file.write("Hello, World!")
file.close()

# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(
    container=container_name, blob=local_file_name
)

print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)

# Upload the created file
with open(upload_file_path, "rb") as data:
    blob_client.upload_blob(data)

download_file_path = os.path.join(
    local_path, str.replace(local_file_name, ".txt", "DOWNLOAD.txt")
)
print("\nDownloading blob to \n\t" + download_file_path)

with open(download_file_path, "wb") as download_file:
    download_file.write(blob_client.download_blob().readall())

I receive 2 errors when applying mypy:

test.py:35: error: Argument 1 to "upload_blob" of "BlobClient" has incompatible type "BufferedReader"; expected "Iterable[str]"
test.py:35: note: Following member(s) of "BufferedReader" have conflicts:
test.py:35: note:     Expected:
test.py:35: note:         def __iter__(self) -> Iterator[str]
test.py:35: note:     Got:
test.py:35: note:         def __iter__(self) -> Iterator[bytes]
test.py:43: error: Argument 1 to "write" of "BufferedWriter" has incompatible type "Union[bytes, str]"; expected "Union[bytes, Union[bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]]"
Found 2 errors in 1 file (checked 1 source file)

environment
python: 3.10.4
azure-storage-blob: 12.12.0
mypy: 0.960

Author: TobiRoby
Assignees: jalauzon-msft, vincenttran-msft
Labels:

bug, Storage, Service Attention, Client, customer-reported, needs-team-attention

Milestone: 2022-07

@jalauzon-msft
Copy link
Member

Both of the type issues have been fixed in main and will be present in our next release. There is no need to adjust the sample code now.

@eladkal
Copy link

eladkal commented Aug 3, 2022

Is there an estimated release date for 12.14.0?
Apache Airflow uses azure-storage-blob for the Azure provider and currently we are unable to update package version due to the typing problem. It seems like you already fixed it but have not yet release.

apache/airflow#25426

@jalauzon-msft
Copy link
Member

Hi @eladkal, we are planning a preview release for 12.14.0b1 early this month (ideally next week) and a full release of 12.14.0 in early September.

If you need a solution faster, please share an example of your typing error and possibly we can come up with a workaround.

@eladkal
Copy link

eladkal commented Aug 4, 2022

@jalauzon-msft failure is:

Run mypy for providers.................................................................Failed
- hook id: run-mypy
- exit code: 1

airflow/providers/microsoft/azure/hooks/wasb.py:302: error: Argument 1 to
"write" of "BufferedWriter" has incompatible type "Union[bytes, str]"; expected
"Union[bytes, Union[bytearray, memoryview, array[Any], mmap, _CData]]" 
[arg-type]
                fileblob.write(stream.readall())
                               ^
Found 1 error in 1 file (checked 982 source files)

On:
https://github.com/apache/airflow/blob/cda40836ca55702c543391367d0829ed231c1b35/airflow/providers/microsoft/azure/hooks/wasb.py#L302

@jalauzon-msft
Copy link
Member

@eladkal, thanks. I can confirm this should be addresed in main already and should be part of the upcoming releases but
if you wanted a workaround in the meantime, you could try adjusting your code to something like what I have below. The trick is to explicitly set the result of stream.readall() to have a type of bytes.

with open(file_path, "wb") as fileblob:
    stream = self.download(container_name=container_name, blob_name=blob_name, **kwargs)
    content: bytes = stream.readall()
    fileblob.write(content)

Hopefully this can resolve this issue for you prior to us releasing the fix. Thanks.

@eladkal
Copy link

eladkal commented Aug 6, 2022

Tried that.. it didn't work.
We managed to find another work around to make mypy happy but then it caused our tests to fail. You can see in apache/airflow#25426

We will release next azure provider for Airflow with old versions. Once 12.14.0 is released we will adjust accordingly.

@eladkal
Copy link

eladkal commented Oct 5, 2022

@jalauzon-msft Is there estimation for the release?

@jalauzon-msft
Copy link
Member

Hi @eladkal, the plan is for this to release as part of 12.14.0 that is tentatively scheduled for some time next week. We were hoping to have this released sooner but ran into some issues that pushed our release schedule by about a month.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

8 participants