Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] repository-azure plugin hangs in OpenSearch >= 1.2.0 #1734

Closed
uncycler opened this issue Dec 15, 2021 · 24 comments · Fixed by #1749
Closed

[BUG] repository-azure plugin hangs in OpenSearch >= 1.2.0 #1734

uncycler opened this issue Dec 15, 2021 · 24 comments · Fixed by #1749
Labels
bug Something isn't working v1.2.3

Comments

@uncycler
Copy link

uncycler commented Dec 15, 2021

Describe the bug
Since 1.2.0, the repository-azure plugin stop working correctly. The PUT command to create the new repository hanging forever and the thread pool queue is filling up with 120 generic tasks and the master node is eating all the cpu resources it got:

"CNLPL4MfQ1aOeA1io2LXKw:44940" : {
    "node" : "CNLPL4MfQ1aOeA1io2LXKw",
    "id" : 44940,
    "type" : "transport",
    "action" : "cluster:admin/snapshot/get",
    "start_time_in_millis" : 1639552118629,
    "running_time_in_nanos" : 205051397113,
    "cancellable" : false,
    "parent_task_id" : "uY6TEyVlSQCxiJxkMJq6Sg:10583",
    "headers" : { }
  },

Nothing is logged. Is there anyway to enable debug logging on plugins?

Also, if you look at transactions/sec metrics in the azure storage account, there is thousands of them:
image

To Reproduce
Steps to reproduce the behavior:

  1. Add Azure Storage Account info (name and sas token) in keystore
    azure.client.default.account
    azure.client.default.sas_token

  2. Create the snapshot repository.

PUT _snapshot/azure
{
  "type": "azure",
  "settings": {
    "client": "default",
    "container": "opensearch"
    "base_path": "subfolder"
  }
}

This should hangs forever.
3. See the thread pool or running tasks

GET /_cat/thread_pool
GET _tasks

Expected behavior

{
  "acknowledged" : true
}

Plugins

  • repository-azure

Host/Environment (please complete the following information):

  • opensearch 1.2.1 docker image running in Kubernetes
@uncycler uncycler added bug Something isn't working untriaged labels Dec 15, 2021
@juntezhang
Copy link

This bug is serious and prevents from upgrading to OpenSearch 1.2.1 to fix the log4j2 REC issue. I also created a thread on the forums: https://discuss.opendistrocommunity.dev/t/snapshot-stuck-in-in-progress-in-all-but-shows-as-succeeded-in-status/8014/2

@reta
Copy link
Collaborator

reta commented Dec 15, 2021

@juntezhang looking into it

@reta
Copy link
Collaborator

reta commented Dec 15, 2021

@juntezhang any exceptions on the server side (logs/ folder) you could share?

@juntezhang
Copy link

@reta there are no exceptions logged by OpenSearch. It just hangs.

@reta
Copy link
Collaborator

reta commented Dec 15, 2021

Could you please capture the thread dumps for the Opensearch server process? I think the issue is figured out

@reta
Copy link
Collaborator

reta commented Dec 15, 2021

@PaulLesur @juntezhang so the issue is closely related to FasterXML/jackson-databind#3322 and in the nutshell, Azure Blob APIs V12 heavily relies on the fact that empty XML elements / attributes are going to be nullified.

However, sadly, it highly depends on XMLInputReader instance being picked up at runtime: the Woodstox does that, whereas the default one from JDK com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl does not. It leads to infinite loop within listBlobsByHierarchy or listBlobs - the page navigation only understands null as termination condition.

Working on the fix now.

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Let's talk about a release for this in opensearch-project/opensearch-build#1365? Will a release of just the plugin with version 1.2.2.1 work?

@reta
Copy link
Collaborator

reta commented Dec 16, 2021

Let's talk about a release for this in opensearch-project/opensearch-build#1365? Will a release of just the plugin with version 1.2.2.1 work?

@dblock I think technically it will work, but from the code perspective, it will go to 1.2 branch, could we track the release of the plugin to particular commit? (wondering how we could match binary and source artifacts since it is the same repository).

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Let's talk about a release for this in opensearch-project/opensearch-build#1365? Will a release of just the plugin with version 1.2.2.1 work?

@dblock I think technically it will work, but from the code perspective, it will go to 1.2 branch, could we track the release of the plugin to particular commit? (wondering how we could match binary and source artifacts since it is the same repository).

We'll increment the version and make a tag like we always do. 1.2 is just the line for all the 1.2.x releases.

@dblock
Copy link
Member

dblock commented Dec 16, 2021

Scratch my idea of incrementing a version to 1.2.2.1, there are waaay too many assumptions in OpenSearch about a 3-digit version to fix in a patch release. I tried for 30 minutes and got dozens of changes ;)

Can I have everybody's opinion on this thread, @reta @uncycler @juntezhang - which one do you think we should do:

  1. Build another 1.2.2, pluck repository-azure-1.2.2.zip out of it, sign it, and replace only the zip in our existing 1.2.2 distribution. We cannot overwrite published maven artifacts I believe.
  2. Make a complete 1.2.3 release of OpenSearch with all the plugins.

@dblock dblock changed the title [BUG] repository-azure is not working properly with OS >=1.2.0 [BUG] repository-azure plugin hangs in OpenSearch >= 1.2.0 Dec 16, 2021
@peternied
Copy link
Member

peternied commented Dec 16, 2021

I vote for option 2 follows our semver principles. 4 hours ago when we didn't yet release OpenSearch 1.2.2 to maven we would have had other options.

@peterzhuamazon
Copy link
Member

Since we also release on maven I vote for 2 as we cannot override now:
https://aws.oss.sonatype.org/content/repositories/releases/org/opensearch/plugin/repository-azure/1.2.2/

@reta
Copy link
Collaborator

reta commented Dec 16, 2021

I vote for option 2 follows our semver principles. 4 hours ago when we didn't yet release OpenSearch 1.2.2 to maven we would have had other options.

+1 for 2nd option (sorry @dblock), just to clarify, the release of 1.2.2 without this bugfix was by intention

@stockholmux
Copy link
Member

I both love and hate a 1.2.3. Having a shadow release is madness and against semver. On the other hand, I dislike releasing so often, wolf crying comes to mind.

@reta
Copy link
Collaborator

reta commented Dec 16, 2021

I both love and hate a 1.2.3. Having a shadow release is madness and against semver. On the other hand, I dislike releasing so often, wolf crying comes to mind.

Sorry about that, the timing with log4j is not good

@juntezhang
Copy link

juntezhang commented Dec 16, 2021 via email

@reta
Copy link
Collaborator

reta commented Dec 16, 2021

@nknize @dblock do you think guys it is worth extracting the repository plugins off the main repo? (it looks doable in general)

@dblock
Copy link
Member

dblock commented Dec 16, 2021

I think option 2 is probably cleanest and most transparent, but I can image you would like to combine it with more features or bug fixes and not do 3 releases within 1 week. I think those who want to have this fix can fork it and install the plugin from file. Or switch to snapshots to FS instead. Anyway, thanks for the quick fix and looking forward to the release (at some point).

I would be ok with this, but we are telling everyone to upgrade to 1.2.2 because of log4j, and there’s nothing to upgrade to for users of this plug-in

@dblock
Copy link
Member

dblock commented Dec 16, 2021

@nknize @dblock do you think guys it is worth extracting the repository plugins off the main repo? (it looks doable in general)

I was going to suggest that. There’s no reason for these plugins to be tied to OpenSearch IMO. Appreciate if you could open an issue either way.

@reta
Copy link
Collaborator

reta commented Dec 16, 2021

Done, #1754

@dblock
Copy link
Member

dblock commented Dec 17, 2021

@reta @uncycler @juntezhang care to confirm that @reta's fix works in https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.2.2/102/linux/x64/builds/opensearch/core-plugins/repository-azure-1.2.2.zip, this does not have a version increment, we're going to do this and go to 1.2.3.

If you need an OpenSearch-min, https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.2.2/102/linux/x64/builds/opensearch/dist/opensearch-min-1.2.2-linux-x64.tar.gz

@uncycler
Copy link
Author

@reta @uncycler @juntezhang care to confirm that @reta's fix works in https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.2.2/102/linux/x64/builds/opensearch/core-plugins/repository-azure-1.2.2.zip, this does not have a version increment, we're going to do this and go to 1.2.3.

I confirm that the updated plugin is working as expected.

@jcgraybill
Copy link

jcgraybill commented Dec 22, 2021

Confirmed that 1.2.3 produces the correct behavior for me:

./bin/opensearch-plugin install repository-azure
./bin/opensearch-keystore add azure.client.default.account
./bin/opensearch-keystore add azure.client.default.key
./bin/opensearch -d -p opensearch.pid
/usr/bin/curl http://localhost:9200/songs/_doc -X POST -H 'Content-Type: application/json' -d '{"title": "Inside Out", "artist": "Eve 6"}'
/usr/bin/curl http://localhost:9200/songs/_doc -X POST -H 'Content-Type: application/json' -d '{"title": "Semi-Charmed Life", "artist": "Third Eye Blind"}'
/usr/bin/curl http://localhost:9200/_snapshot/testbackup -X PUT -H 'Content-Type: application/json' -d '{"type": "azure", "settings": {"container": "testbackup"}}'
{"acknowledged":true}
/usr/bin/curl http://localhost:9200/_snapshot/testbackup/1 -X PUT  
{"accepted":true}

azure

Transactions / sec is back to an expected range:
tps

reta added a commit to reta/OpenSearch that referenced this issue Mar 11, 2022
…tream solution is available

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
dblock pushed a commit that referenced this issue Mar 15, 2022
…available (#2446)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
opensearch-trigger-bot bot pushed a commit that referenced this issue Mar 15, 2022
…available (#2446)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit 12dd5d7)
dblock pushed a commit that referenced this issue Mar 16, 2022
…available (#2446) (#2475)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit 12dd5d7)

Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v1.2.3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants