-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Blobs Partial list deserialization #19814
Conversation
/azp run python - storage - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - storage - ci |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run python - storage - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
RedirectPolicy(**kwargs), | ||
StorageHosts(hosts=self._hosts, **kwargs), | ||
config.retry_policy, | ||
config.headers_policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note: headers_policy cannot be put before retry, since we need to regenerate the timestamp in header for the retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xiafu-msft - I wondered about this! Why was it different between the sync and async pipelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Anna @annatisch if async is having different order then it's a bug...
def failsafe_deserialize(self, target_obj, data, content_type=None): | ||
"""Ignores any errors encountered in deserialization, | ||
and falls back to not deserializing the object. Recommended | ||
for use in error deserialization, as we want to return the | ||
HttpResponseError to users, and not have them deal with | ||
a deserialization error. | ||
|
||
:param str target_obj: The target object type to deserialize to. | ||
:param str/dict data: The response data to deseralize. | ||
:param str content_type: Swagger "produces" if available. | ||
""" | ||
try: | ||
return self(target_obj, data, content_type=content_type) | ||
except: # pylint: disable=bare-except | ||
_LOGGER.warning( | ||
"Ran into a deserialization error. Ignoring since this is failsafe deserialization", | ||
exc_info=True | ||
) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this not decoding the response body? ContentDecodingPolicy was turning body which describes the error into string, so probably we want to do the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also some response body on error is in json format, not sure if it will be a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some explanatory comments for @jalauzon-msft and @vincenttran-msft
return blob | ||
|
||
|
||
class BlobPropertiesPaged(PageIterator): # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I updated the existing BlobPropertiesPaged - which means the perf of this model would be improved, however if we wanted to leave the original list_blobs
API completely untouched, we could revert the changes here and have the new list_blob_names
API use it's own custom Paged object.
@@ -74,6 +75,7 @@ def __init__( | |||
# type: (...) -> None | |||
self._location_mode = kwargs.get("_location_mode", LocationMode.PRIMARY) | |||
self._hosts = kwargs.get("_hosts") | |||
self._msrest_xml = kwargs.get('msrest_xml', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd want to remove these for now most likely.
Though we could in future look at doing some kind of 'opt-in' flag to opt out of msrest.
Might be best to leave that decision for now until the long-term deprecation story for msrest starts developing. Then we can start planning more realistically.
@@ -237,21 +253,22 @@ def _create_pipeline(self, credential, **kwargs): | |||
config.transport = RequestsTransport(**kwargs) | |||
policies = [ | |||
QueueMessagePolicy(), | |||
config.headers_policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what these changes in the pipeline policy order were all about. We should probably ignore this for now.
I remove the ContentDecode policy below because this was prematurely decoding the XML, however I don't think messing around with the pipeline will be necessary for doing just a list_blob_names
API.
@@ -1452,12 +1452,12 @@ async def list_blob_flat_segment( | |||
response_headers['x-ms-request-id']=self._deserialize('str', response.headers.get('x-ms-request-id')) | |||
response_headers['x-ms-version']=self._deserialize('str', response.headers.get('x-ms-version')) | |||
response_headers['Date']=self._deserialize('rfc-1123', response.headers.get('Date')) | |||
deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response) | |||
#deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the bigger challenges to figure out. Currently the deserialization process is out of our hands. We would need to add directives to the autorest code gen for the list_blobs_flat_segment API to not deserialize the response payload. This should be possible by simply overwriting the output model to have no output. We then use the cls
hook and do the deserialization ourselves.
In the case of the existing list_blobs
API, this probably means manually using the existing msrest deserializer if we don't want to deal with the testing burden of validating the new deserializer for the old API.
@@ -175,6 +176,8 @@ def __init__( | |||
self._query_str, credential = self._format_query_string(sas_token, credential, snapshot=self.snapshot) | |||
super(BlobClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs) | |||
self._client = AzureBlobStorage(self.url, pipeline=self._pipeline) | |||
if not self._msrest_xml: | |||
self._custom_xml_deserializer(generated_models) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removable. Same does for the all the clients.
generated = deserializer.deserialize_data(element, 'BlobItemInternal') | ||
return get_blob_properties_from_generated_code(generated) | ||
blob = BlobProperties() | ||
if 'name' in select: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this select
logic in case we wanted to return more from the payload than just the name. However that seems unlikely - so we could refactor this out and simplify the logic a big here.
|
||
def blob_properties_from_xml(element, select, deserializer): | ||
if not select: | ||
generated = deserializer.deserialize_data(element, 'BlobItemInternal') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using the old msrest deserializer - so once we've altered the generated layer to not deserialize for us - keeping this should mean that the existing list_blobs
doesn't change.
from ._shared.response_handlers import return_context_and_deserialized, process_storage_error | ||
|
||
|
||
class BlobPropertiesPaged(PageIterator): | ||
def deserialize_list_result(pipeline_response, *_): | ||
payload = unpack_xml_content(pipeline_response.http_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this line here is replacing the ContentDecodePolicy that I removed from the pipeline. So this would already be unpacked if we put that policy back in.
@@ -73,30 +133,29 @@ def _get_next_cb(self, continuation_token): | |||
prefix=self.prefix, | |||
marker=continuation_token or None, | |||
maxresults=self.results_per_page, | |||
cls=return_context_and_deserialized, | |||
cls=deserialize_list_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cls
parameter I mentioned that we would use to hook into the deserialization.
# IN THE SOFTWARE. | ||
# | ||
# -------------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove this for now. If we just want to add the list_blob_names
API, then I think we can keep the existing ContentDecodePolicy
, and the list_blob_helper.py
file already has the extraction of data from the XML payload, so I think we could just store this away for a distant future if and when we want to decouple from msrest.
Hi @annatisch. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days. |
Hi @annatisch. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing "/reopen" if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the "no-recent-activity" label; otherwise, this is likely to be closed again with the next cleanup pass. |
No description provided.