Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Blobs Partial list deserialization #19814

Closed
wants to merge 15 commits into from

Conversation

annatisch
Copy link
Member

No description provided.

@annatisch annatisch added Storage Storage Service (Queues, Blobs, Files) Do Not Merge labels Jul 15, 2021
@annatisch
Copy link
Member Author

/azp run python - storage - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@annatisch
Copy link
Member Author

/azp run python - storage - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@annatisch
Copy link
Member Author

/azp run python - storage - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

RedirectPolicy(**kwargs),
StorageHosts(hosts=self._hosts, **kwargs),
config.retry_policy,
config.headers_policy,
Copy link
Contributor

@xiafu-msft xiafu-msft Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note: headers_policy cannot be put before retry, since we need to regenerate the timestamp in header for the retry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiafu-msft - I wondered about this! Why was it different between the sync and async pipelines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Anna @annatisch if async is having different order then it's a bug...

Comment on lines +461 to +479
def failsafe_deserialize(self, target_obj, data, content_type=None):
"""Ignores any errors encountered in deserialization,
and falls back to not deserializing the object. Recommended
for use in error deserialization, as we want to return the
HttpResponseError to users, and not have them deal with
a deserialization error.

:param str target_obj: The target object type to deserialize to.
:param str/dict data: The response data to deseralize.
:param str content_type: Swagger "produces" if available.
"""
try:
return self(target_obj, data, content_type=content_type)
except: # pylint: disable=bare-except
_LOGGER.warning(
"Ran into a deserialization error. Ignoring since this is failsafe deserialization",
exc_info=True
)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this not decoding the response body? ContentDecodingPolicy was turning body which describes the error into string, so probably we want to do the same thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also some response body on error is in json format, not sure if it will be a problem

Copy link
Member Author

@annatisch annatisch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some explanatory comments for @jalauzon-msft and @vincenttran-msft

return blob


class BlobPropertiesPaged(PageIterator): # pylint: disable=too-many-instance-attributes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I updated the existing BlobPropertiesPaged - which means the perf of this model would be improved, however if we wanted to leave the original list_blobs API completely untouched, we could revert the changes here and have the new list_blob_names API use it's own custom Paged object.

@@ -74,6 +75,7 @@ def __init__(
# type: (...) -> None
self._location_mode = kwargs.get("_location_mode", LocationMode.PRIMARY)
self._hosts = kwargs.get("_hosts")
self._msrest_xml = kwargs.get('msrest_xml', False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want to remove these for now most likely.
Though we could in future look at doing some kind of 'opt-in' flag to opt out of msrest.
Might be best to leave that decision for now until the long-term deprecation story for msrest starts developing. Then we can start planning more realistically.

@@ -237,21 +253,22 @@ def _create_pipeline(self, credential, **kwargs):
config.transport = RequestsTransport(**kwargs)
policies = [
QueueMessagePolicy(),
config.headers_policy,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what these changes in the pipeline policy order were all about. We should probably ignore this for now.
I remove the ContentDecode policy below because this was prematurely decoding the XML, however I don't think messing around with the pipeline will be necessary for doing just a list_blob_names API.

@@ -1452,12 +1452,12 @@ async def list_blob_flat_segment(
response_headers['x-ms-request-id']=self._deserialize('str', response.headers.get('x-ms-request-id'))
response_headers['x-ms-version']=self._deserialize('str', response.headers.get('x-ms-version'))
response_headers['Date']=self._deserialize('rfc-1123', response.headers.get('Date'))
deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response)
#deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the bigger challenges to figure out. Currently the deserialization process is out of our hands. We would need to add directives to the autorest code gen for the list_blobs_flat_segment API to not deserialize the response payload. This should be possible by simply overwriting the output model to have no output. We then use the cls hook and do the deserialization ourselves.
In the case of the existing list_blobs API, this probably means manually using the existing msrest deserializer if we don't want to deal with the testing burden of validating the new deserializer for the old API.

@@ -175,6 +176,8 @@ def __init__(
self._query_str, credential = self._format_query_string(sas_token, credential, snapshot=self.snapshot)
super(BlobClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
self._client = AzureBlobStorage(self.url, pipeline=self._pipeline)
if not self._msrest_xml:
self._custom_xml_deserializer(generated_models)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removable. Same does for the all the clients.

generated = deserializer.deserialize_data(element, 'BlobItemInternal')
return get_blob_properties_from_generated_code(generated)
blob = BlobProperties()
if 'name' in select:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this select logic in case we wanted to return more from the payload than just the name. However that seems unlikely - so we could refactor this out and simplify the logic a big here.


def blob_properties_from_xml(element, select, deserializer):
if not select:
generated = deserializer.deserialize_data(element, 'BlobItemInternal')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the old msrest deserializer - so once we've altered the generated layer to not deserialize for us - keeping this should mean that the existing list_blobs doesn't change.

from ._shared.response_handlers import return_context_and_deserialized, process_storage_error


class BlobPropertiesPaged(PageIterator):
def deserialize_list_result(pipeline_response, *_):
payload = unpack_xml_content(pipeline_response.http_response)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line here is replacing the ContentDecodePolicy that I removed from the pipeline. So this would already be unpacked if we put that policy back in.

@@ -73,30 +133,29 @@ def _get_next_cb(self, continuation_token):
prefix=self.prefix,
marker=continuation_token or None,
maxresults=self.results_per_page,
cls=return_context_and_deserialized,
cls=deserialize_list_result,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cls parameter I mentioned that we would use to hook into the deserialization.

# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this for now. If we just want to add the list_blob_names API, then I think we can keep the existing ContentDecodePolicy, and the list_blob_helper.py file already has the extraction of data from the XML payload, so I think we could just store this away for a distant future if and when we want to decouple from msrest.

@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Jul 1, 2022
@ghost
Copy link

ghost commented Jul 1, 2022

Hi @annatisch. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@ghost ghost closed this Jul 8, 2022
@ghost
Copy link

ghost commented Jul 8, 2022

Hi @annatisch. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing "/reopen" if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the "no-recent-activity" label; otherwise, this is likely to be closed again with the next cleanup pass.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Do Not Merge no-recent-activity There has been no recent activity on this issue. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants