[BUG] Infinite loop in BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs #26064

reta · 2021-12-16T13:56:48Z

Describe the bug
It turns out that Azure SDK v12 is very sensitive to the XMLInputReader implementation (coming from JacksonAdapter) and heavily relies on the fact that empty XML elements / attributes are going to be nullified.

However, sadly, it highly depends on XMLInputReader instance being picked up at runtime: the Woodstox does that, whereas the default one from JDK com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl does not. It leads to infinite loop within BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs - the page iterables (ContinuablePagedByXxx) only understands null as termination condition.

The XMLInputReader instance is created by Jackson's XmlFactory and is used by FromXmlParser to parse XML payloads.

Exception or Stack Trace
There is no stack trace, the BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs never return trying to fetch the next pages by empty continuation token.

To Reproduce
It is very easy to reproduce, here is the code snippet with ListBlobsFlatSegmentResponse response example:

package io.aven.security.server;

import java.io.IOException;

import com.azure.core.util.serializer.JacksonAdapter;
import com.azure.core.util.serializer.SerializerAdapter;
import com.azure.core.util.serializer.SerializerEncoding;
import com.azure.storage.blob.implementation.models.ListBlobsFlatSegmentResponse;

public class MarkerIssueRunner {
    public static void main(String[] args) throws IOException {
        var response = """
        <?xml version="1.0" encoding="utf-8"?>
        <EnumerationResults ServiceEndpoint="https://aiventestandriyredko.blob.core.windows.net/" ContainerName="opensearch-snapshots">
            <Prefix>tests-v57W2zP6QMu-feMw6GvVYA/</Prefix>
            <Blobs />
            <NextMarker />
        </EnumerationResults>
        """;
        
        final SerializerAdapter adapter =  JacksonAdapter.createDefaultSerializerAdapter();
        ListBlobsFlatSegmentResponse obj = adapter.deserialize(response.getBytes(), ListBlobsFlatSegmentResponse.class, SerializerEncoding.XML);
        System.out.println("Next Marker Is: '" + obj.getNextMarker() + "'");
    }
}

It uses JDK17 syntax but reproducible on any modern JDKs. When run with -Djavax.xml.stream.XMLInputFactory=com.sun.xml.internal.stream.XMLInputFactoryImpl, the output of the program is:

Next Marker Is: ''

When run without -Djavax.xml.stream.XMLInputFactory (or equivalent of -Djavax.xml.stream.XMLInputFactory=com.ctc.wstx.stax.WstxInputFactory), the output of the program is:

Next Marker Is: 'null'

Code Snippet

for (final BlobItem blobItem : blobContainer.listBlobs(listBlobsOptions, timeout())) {
   ....
}

If timeout is not specified, the listBlobs never returns.

Expected behavior
The function should return normally.

Screenshots
If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

OS: Any
IDE: Any
Library/Libraries: com.azure:azure-core:1.20.0, com.azure:azure-storage-blob:12.13.0, Jackson 2.12.3
Java version: [e.g. 8]
App Server/Environment: Any
Frameworks: Any

Additional context
This particular issue is only happening when non-Woodstox XMLInputReader is picked up, there are multiple options to this particular problem:
a) Enhance page iterables (ContinuablePagedByXxx) to treat empty and null token as equivalent
b) Allow to provide own XmlFactory instance in JacksonAdapter through XmlMapper.builder(XmlFactory) constructor (which covers both XMLInputReader and XMLInputWriter)
c) Use Woodstox explicitly in the JacksonAdapter while configuring XmlMapper

I believe the option a) is the most appropriate thing to do.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

Bug Description Added
Repro Steps Added
Setup information Added

The text was updated successfully, but these errors were encountered:

reta · 2021-12-16T14:10:04Z

@alzimmermsft @rickle-msft guys, if it makes sense to you, I would be happy to submit the pull request to enhance page iterables (ContinuablePagedByXxx) to treat empty and null token as equivalent

rickle-msft · 2021-12-16T14:13:14Z

@reta Thank you for opening this issue and for the thorough description and suggestions. I think @alzimmermsft will be most equipped to respond to this when he gets back from vacation in a few days

alzimmermsft · 2021-12-20T19:30:43Z

Thank you for reporting this @reta. I'm taking a look into the root issue, I'll have an update soon.

alzimmermsft · 2021-12-21T17:35:27Z

@reta I've completed my preliminary troubleshooting of this issue, and this was an amazing find on your part!

What you've found is a difference in paging termination between PagedFlux and PagedIterable (more specifically in their super classes but these are what is exposed in the Storage SDKs). PagedIterable has a divergent code path from PagedFlux due to the way that Reactor had, and possibly still has, handled transitioning a reactive stream into an Iterable or Stream where the next element retrieval would eagerly populate the next-next element resulting in errant page requests.

A few months ago there was logic added into the PagedFlux class hierarchy that allowed for a Predicate to be passed to determine when paging should terminate and the default changed from continuation token == null to continuation token == "" || continuation token == null when a String based continuation token was being used (continuation token == null is still the default for non-String continuation tokens). Unfortunately, there was an oversight on PagedIterable and PagedFlux using divergent code paths which resulted in PagedIterable not using the paging termination Predicate. I've filed PR #26139 to resolve this difference.

One quick ask I have from your side is using the same setup could you try using the async paging to double verify my statements above. It should be as simple as:

BlobContainerAsyncClient asyncContainerClient = null; // builder logic here

List<BlobItem> blobItems = asyncContainerClient.listBlobs().listBlobs().collectList().block();

If the above was true this should terminate and not run infinitely.

reta · 2021-12-21T18:52:21Z

Thanks a lot for looking into it @alzimmermsft , I could try async paging, but would it actually page? In the snippet you have provided it looks like all blobs are going to be collected all at once.

alzimmermsft · 2021-12-21T19:32:20Z

Thanks a lot for looking into it @alzimmermsft , I could try async paging, but would it actually page? In the snippet you have provided it looks like all blobs are going to be collected all at once.

Yeah, that is correct that all blobs are going to be collected at once but underneath it is just consuming paged responses until paging terminates. A small change to be closer to what you've posted in the original issue statement would be:

asyncContainerClient.listBlobs(listBlobsOptions).timeout(timeout())
    .map(blobItem -> ....)
    .block();

The timeout will cause the reactive stream to throw an error if a page isn't received before the duration completes and is reset each time a page is received. Block will make it so the application won't continue running while paging is going on.

reta mentioned this issue Dec 16, 2021

[plugin] repository-azure is not working properly hangs on basic operations opensearch-project/OpenSearch#1740

Merged

5 tasks

reta mentioned this issue Dec 16, 2021

[plugin] repository-azure: revert the fix for https://github.com/opensearch-project/OpenSearch/issues/1734 once upstream solution is available opensearch-project/OpenSearch#1748

Closed

chenrujun added the Storage Storage Service (Queues, Blobs, Files) label Dec 16, 2021

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 16, 2021

chenrujun assigned alzimmermsft Dec 16, 2021

alzimmermsft mentioned this issue Dec 21, 2021

Share Paging Termination Between PagedFlux and PagedIterable #26139

Merged

6 tasks

alzimmermsft closed this as completed in #26139 Dec 21, 2021

github-actions bot locked and limited conversation to collaborators Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Infinite loop in BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs #26064

[BUG] Infinite loop in BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs #26064

reta commented Dec 16, 2021 •

edited

Loading

reta commented Dec 16, 2021

rickle-msft commented Dec 16, 2021

alzimmermsft commented Dec 20, 2021

alzimmermsft commented Dec 21, 2021

reta commented Dec 21, 2021

alzimmermsft commented Dec 21, 2021 •

edited

Loading

[BUG] Infinite loop in BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs #26064

[BUG] Infinite loop in BlobContainerClient::listBlobsByHierarchy and BlobContainerClient::listBlobs #26064

Comments

reta commented Dec 16, 2021 • edited Loading

reta commented Dec 16, 2021

rickle-msft commented Dec 16, 2021

alzimmermsft commented Dec 20, 2021

alzimmermsft commented Dec 21, 2021

reta commented Dec 21, 2021

alzimmermsft commented Dec 21, 2021 • edited Loading

reta commented Dec 16, 2021 •

edited

Loading

alzimmermsft commented Dec 21, 2021 •

edited

Loading