[POC] Blobs Partial list deserialization #19814

annatisch · 2021-07-15T02:41:09Z

No description provided.

annatisch · 2021-07-22T01:30:08Z

/azp run python - storage - tests

azure-pipelines · 2021-07-22T01:30:20Z

Azure Pipelines successfully started running 1 pipeline(s).

annatisch · 2021-07-22T21:49:38Z

/azp run python - storage - ci

azure-pipelines · 2021-07-22T21:49:57Z

Azure Pipelines successfully started running 1 pipeline(s).

annatisch · 2021-07-22T22:59:40Z

/azp run python - storage - tests

azure-pipelines · 2021-07-22T23:00:09Z

Azure Pipelines successfully started running 1 pipeline(s).

xiafu-msft · 2021-08-04T20:56:50Z

sdk/storage/azure-storage-blob/azure/storage/blob/_shared/base_client.py

            RedirectPolicy(**kwargs),
            StorageHosts(hosts=self._hosts, **kwargs),
            config.retry_policy,
-            config.headers_policy,


just a note: headers_policy cannot be put before retry, since we need to regenerate the timestamp in header for the retry

Thanks @xiafu-msft - I wondered about this! Why was it different between the sync and async pipelines?

thanks Anna @annatisch if async is having different order then it's a bug...

xiafu-msft · 2021-08-04T22:06:16Z

sdk/storage/azure-storage-blob/azure/storage/blob/_shared/xml_deserialization.py

+    def failsafe_deserialize(self, target_obj, data, content_type=None):
+        """Ignores any errors encountered in deserialization,
+        and falls back to not deserializing the object. Recommended
+        for use in error deserialization, as we want to return the
+        HttpResponseError to users, and not have them deal with
+        a deserialization error.
+
+        :param str target_obj: The target object type to deserialize to.
+        :param str/dict data: The response data to deseralize.
+        :param str content_type: Swagger "produces" if available.
+        """
+        try:
+            return self(target_obj, data, content_type=content_type)
+        except:  # pylint: disable=bare-except
+            _LOGGER.warning(
+                "Ran into a deserialization error. Ignoring since this is failsafe deserialization",
+				exc_info=True
+            )
+            return None


It seems this not decoding the response body? ContentDecodingPolicy was turning body which describes the error into string, so probably we want to do the same thing?

also some response body on error is in json format, not sure if it will be a problem

annatisch

Leaving some explanatory comments for @jalauzon-msft and @vincenttran-msft

annatisch · 2022-04-27T02:26:54Z

sdk/storage/azure-storage-blob/azure/storage/blob/_list_blobs_helper.py

+    return blob
+
+
+class BlobPropertiesPaged(PageIterator):  # pylint: disable=too-many-instance-attributes


It looks like I updated the existing BlobPropertiesPaged - which means the perf of this model would be improved, however if we wanted to leave the original list_blobs API completely untouched, we could revert the changes here and have the new list_blob_names API use it's own custom Paged object.

annatisch · 2022-04-27T02:29:32Z

sdk/storage/azure-storage-blob/azure/storage/blob/_shared/base_client.py

@@ -74,6 +75,7 @@ def __init__(
        # type: (...) -> None
        self._location_mode = kwargs.get("_location_mode", LocationMode.PRIMARY)
        self._hosts = kwargs.get("_hosts")
+        self._msrest_xml = kwargs.get('msrest_xml', False)


We'd want to remove these for now most likely.
Though we could in future look at doing some kind of 'opt-in' flag to opt out of msrest.
Might be best to leave that decision for now until the long-term deprecation story for msrest starts developing. Then we can start planning more realistically.

annatisch · 2022-04-27T02:31:28Z

sdk/storage/azure-storage-blob/azure/storage/blob/_shared/base_client.py

@@ -237,21 +253,22 @@ def _create_pipeline(self, credential, **kwargs):
            config.transport = RequestsTransport(**kwargs)
        policies = [
            QueueMessagePolicy(),
+            config.headers_policy,


Not sure what these changes in the pipeline policy order were all about. We should probably ignore this for now.
I remove the ContentDecode policy below because this was prematurely decoding the XML, however I don't think messing around with the pipeline will be necessary for doing just a list_blob_names API.

annatisch · 2022-04-27T02:35:39Z

...age/azure-storage-blob/azure/storage/blob/_generated/aio/operations/_container_operations.py

@@ -1452,12 +1452,12 @@ async def list_blob_flat_segment(
        response_headers['x-ms-request-id']=self._deserialize('str', response.headers.get('x-ms-request-id'))
        response_headers['x-ms-version']=self._deserialize('str', response.headers.get('x-ms-version'))
        response_headers['Date']=self._deserialize('rfc-1123', response.headers.get('Date'))
-        deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response)
+        #deserialized = self._deserialize('ListBlobsFlatSegmentResponse', pipeline_response)


This is one of the bigger challenges to figure out. Currently the deserialization process is out of our hands. We would need to add directives to the autorest code gen for the list_blobs_flat_segment API to not deserialize the response payload. This should be possible by simply overwriting the output model to have no output. We then use the cls hook and do the deserialization ourselves.
In the case of the existing list_blobs API, this probably means manually using the existing msrest deserializer if we don't want to deal with the testing burden of validating the new deserializer for the old API.

annatisch · 2022-04-27T02:36:34Z

sdk/storage/azure-storage-blob/azure/storage/blob/_blob_client.py

@@ -175,6 +176,8 @@ def __init__(
        self._query_str, credential = self._format_query_string(sas_token, credential, snapshot=self.snapshot)
        super(BlobClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
        self._client = AzureBlobStorage(self.url, pipeline=self._pipeline)
+        if not self._msrest_xml:
+            self._custom_xml_deserializer(generated_models)


Removable. Same does for the all the clients.

annatisch · 2022-04-27T02:38:21Z

sdk/storage/azure-storage-blob/azure/storage/blob/_list_blobs_helper.py

+        generated = deserializer.deserialize_data(element, 'BlobItemInternal')
+        return get_blob_properties_from_generated_code(generated)
+    blob = BlobProperties()
+    if 'name' in select:


I implemented this select logic in case we wanted to return more from the payload than just the name. However that seems unlikely - so we could refactor this out and simplify the logic a big here.

annatisch · 2022-04-27T02:39:45Z

sdk/storage/azure-storage-blob/azure/storage/blob/_list_blobs_helper.py

+
+def blob_properties_from_xml(element, select, deserializer):
+    if not select:
+        generated = deserializer.deserialize_data(element, 'BlobItemInternal')


This is using the old msrest deserializer - so once we've altered the generated layer to not deserialize for us - keeping this should mean that the existing list_blobs doesn't change.

annatisch · 2022-04-27T02:43:10Z

sdk/storage/azure-storage-blob/azure/storage/blob/_list_blobs_helper.py

 from ._shared.response_handlers import return_context_and_deserialized, process_storage_error


-class BlobPropertiesPaged(PageIterator):
+def deserialize_list_result(pipeline_response, *_):
+    payload = unpack_xml_content(pipeline_response.http_response)


I believe this line here is replacing the ContentDecodePolicy that I removed from the pipeline. So this would already be unpacked if we put that policy back in.

annatisch · 2022-04-27T02:44:19Z

sdk/storage/azure-storage-blob/azure/storage/blob/_list_blobs_helper.py

@@ -73,30 +133,29 @@ def _get_next_cb(self, continuation_token):
                prefix=self.prefix,
                marker=continuation_token or None,
                maxresults=self.results_per_page,
-                cls=return_context_and_deserialized,
+                cls=deserialize_list_result,


This is the cls parameter I mentioned that we would use to hook into the deserialization.

annatisch · 2022-04-27T02:47:05Z

sdk/storage/azure-storage-blob/azure/storage/blob/_shared/xml_deserialization.py

+# IN THE SOFTWARE.
+#
+# --------------------------------------------------------------------------
+


We can probably remove this for now. If we just want to add the list_blob_names API, then I think we can keep the existing ContentDecodePolicy, and the list_blob_helper.py file already has the extraction of data from the XML payload, so I think we could just store this away for a distant future if and when we want to decouple from msrest.

ghost · 2022-07-01T10:05:52Z

Hi @annatisch. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

ghost · 2022-07-08T11:04:43Z

Hi @annatisch. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing "/reopen" if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the "no-recent-activity" label; otherwise, this is likely to be closed again with the next cleanup pass.

Partial list deserialization

ab1fe0a

annatisch added Storage Storage Service (Queues, Blobs, Files) Do Not Merge labels Jul 15, 2021

annatisch added 9 commits July 16, 2021 14:17

XML POC

5f9d0ac

Plug in deserializer

2bb536c

Remove content decide policy

7b78ee7

Some code cleanup

12a96ab

Refactor part 1

85dd180

Refactor part 2

f984f61

Refactor part 3

5ccd5ea

Refactor part 4

5d868e2

Refactor part 5

9372dc1

annatisch added 4 commits July 22, 2021 07:44

Make xml pipeline opt-in

406647f

Some code cleanup

e4e3450

Don't decode payload

24dbf61

Fix stats test

95557f7

Surfaced as separate API

efb9c56

annatisch mentioned this pull request Jul 26, 2021

Listing blobs names is very slow #19755

Closed

xiafu-msft reviewed Aug 4, 2021

View reviewed changes

tasherif-msft mentioned this pull request Dec 1, 2021

Customer is asserting that V12 Storage SDK has slower performance that V2.1 sdk #9596

Closed

annatisch commented Apr 27, 2022

View reviewed changes

ghost added the no-recent-activity There has been no recent activity on this issue. label Jul 1, 2022

ghost closed this Jul 8, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Blobs Partial list deserialization #19814

[POC] Blobs Partial list deserialization #19814

annatisch commented Jul 15, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

xiafu-msft Aug 4, 2021 •

edited

Loading

annatisch Aug 4, 2021

xiafu-msft Aug 4, 2021

xiafu-msft Aug 4, 2021

xiafu-msft Aug 4, 2021

annatisch left a comment

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

annatisch Apr 27, 2022

ghost commented Jul 1, 2022

ghost commented Jul 8, 2022

		return blob


		class BlobPropertiesPaged(PageIterator): # pylint: disable=too-many-instance-attributes

[POC] Blobs Partial list deserialization #19814

[POC] Blobs Partial list deserialization #19814

Conversation

annatisch commented Jul 15, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

annatisch commented Jul 22, 2021

azure-pipelines bot commented Jul 22, 2021

xiafu-msft Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annatisch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jul 1, 2022

ghost commented Jul 8, 2022

xiafu-msft Aug 4, 2021 •

edited

Loading