Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an intermittent deadlock when DocService is loaded for a Thrift service #4688

Merged
merged 20 commits into from
Mar 27, 2023

Conversation

jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented Feb 21, 2023

Motivation:

We use thrift/grpc internal APIs when constructing DocService.
Following #4491, these internal APIs are called concurrently with service logic.
Due to a thrift bug, a deadlock can occur when FieldMetaData.getStructMetaDataMap is called for a thrift class while the class is initialized.
https://issues.apache.org/jira/browse/THRIFT-5430
This PR attempts to circumvent this issue by pre-loading the class for affected versions (<= armeria-thrift 0.14)

I've also double checked that other *MetaData implementations don't have a similar implementation.

Modifications:

  • Add a ThriftMetadataAccess utility class
  • When ThriftMetadataAccess is instantiated, check if (<= armeria-thrift 0.14) is in the classpath
  • If the vulnerable artifact is loaded, then pre-instantiate thrift classes before FieldMetaData.getStructMetaDataMap is called from DocService

Result:

  • A deadlock is no longer observed when loading DocService for thrift services where <= armeria-thrift 0.14

@jrhee17 jrhee17 added the defect label Feb 21, 2023
@jrhee17 jrhee17 added this to the 1.23.0 milestone Feb 21, 2023
@jrhee17 jrhee17 marked this pull request as ready for review February 22, 2023 06:47
@jrhee17 jrhee17 changed the title Fix an intermittent deadlock when DocService for thrift is loaded Fix an intermittent deadlock when DocService is loaded for a Thrift service Feb 22, 2023
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left only minor nits.

}
ThriftMetadataAccess.preInitializeThriftClass = preInitializeThriftClass;
} catch (Exception e) {
logger.trace("Unexpected exception while determining the 'armeria-thrift' version: ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.linecorp.armeria.versions.properties can be removed while creating a uber jar or shading. A debug level may be useful for that users.

Suggested change
logger.trace("Unexpected exception while determining the 'armeria-thrift' version: ", e);
logger.debug("Unexpected exception while determining the 'armeria-thrift' version: ", e);

Should we also leave a debug message if versionPropertiesUrls is empty?

}

@VisibleForTesting
static boolean needPreInitialize(Properties props) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrid-person verb is commonly used for boolean methods.
https://softwareengineering.stackexchange.com/questions/404650/what-form-of-verb-to-use-imperative-verb-or-third-person-verb-in-programming

Suggested change
static boolean needPreInitialize(Properties props) {
static boolean needsPreInitialization(Properties props) {


import com.google.common.annotations.VisibleForTesting;

public final class ThriftMetadataAccess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to keep a non-public scope if possible. How about moving ThriftMetadataAccess to com.linecorp.armeria.internal.server.thrift and making it package-private? We may move ThriftMetadataAccess to the internal package when clients need to use it.

Suggested change
public final class ThriftMetadataAccess {
final class ThriftMetadataAccess {

}

@Override
public Class<?> loadClass(String name) throws ClassNotFoundException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e1 and e2 may access this method simultaneously.

Suggested change
public Class<?> loadClass(String name) throws ClassNotFoundException {
public synchronized Class<?> loadClass(String name) throws ClassNotFoundException {

Comment on lines 46 to 48
final Enumeration<URL> versionPropertiesUrls =
ThriftMetadataAccess.class.getClassLoader().getResources(
"META-INF/com.linecorp.armeria.versions.properties");
Copy link
Member

@trustin trustin Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can't we just do Version.getAll(...).keySet() to get the list of the artifact IDs?
  • Can't we just always pre-initialize the classes?
  • What do you think about adding an empty resource file like src/main/resources/com/linecorp/armeria/internal/common/thrift/requires_struct_preinit to all thrift modules older than 0.15? We could enable the workaround only if that resource file exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the cost of failure to look for a resource file might be a little bit higher, we could always add a properties file:

If pre-init is required:

structPreinitRequired=true

If pre-init is not required:

<empty file>

Copy link
Contributor Author

@jrhee17 jrhee17 Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just do Version.getAll(...).keySet() to get the list of the artifact IDs?

Didn't know this class existed 😅 It's probably better to use this.

Can't we just always pre-initialize the classes?

I think I prefer making an explicit branch statement for these type of changes:

  1. If we decide to deprecate thrift <= 0.14 modules, we can remove the related logic safely
  2. The reason for pre-initialization is embedded in the code

I do agree that the logic doesn't really affect performance and am open to just preinitializing always if others feel this is simpler.

What do you think about adding an empty resource file
we could always add a properties file

I don't even think this has to be a resource, but it can also be a class file that has a different static value for each thrift module. <- On second thought, this doesn't work if multiple armeria-thrift artifacts are added.
I didn't want to further complicate the thrift file copying logic, but I think this is also an option.

I think I prefer just using Versions.all() like you mentioned, but let me know if you feel differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work if multiple armeria-thrift artifacts are added.

Doesn't a user have a bigger problem if their classpath has multiple armeria-thrift artifacts? We could leave a warning and fall back to safe mode (always preinit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even think this has to be a resource, but it can also be a class file that has a different static value for each thrift module. <- On second thought, this doesn't work if multiple armeria-thrift artifacts are added.

We could leave a warning and fall back to safe mode (always preinit)?

I understood what you meant to be leave a warning if multiple classes exist, am I understanding correctly?
I didn't think it was possible to detect if there are multiple classes with the same package/name/module.

This won't be a problem if we define a separate resource instead of class, but I don't see much benefit in using a dedicated resource over just Versions.all().

(always preinit)?

or did you mean to just always preinit? 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think it was possible to detect if there are multiple classes with the same package/name/module.

You can actually:

Enumeration<URL> urls = getClass().getClassLoader().getResources("com/foo/bar.txt"); // or .class
while (urls.hasMoreElements()) {
    URL url = urls.nextElement();
    InputStream inputStream = url.openStream();
    // do something with the input stream
}

Otherwise, how could Versions.all() fetch the com.linecorp.armeria.versions.properties files from all artifacts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I meant putting some .properties file rather than looking for .class files. I think having a separate resource is still better than extracting version number from artifact ID because it doesn't require any string matching, which could be broken at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a separate resource is still better than extracting version number from artifact ID because it doesn't require any string matching, which could be broken at some point.

I see 😄 Understood.


private static boolean preInitializeThriftClass;

private static final String filename = "../../common/thrift/thrift-options.properties";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use an absolute path from the root resources?
  • Give a more meaningful name?
Suggested change
private static final String filename = "../../common/thrift/thrift-options.properties";
private static final String THRIFT_OPTION_FILE = "../../common/thrift/thrift-options.properties";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an absolute path from the root resources?

The path was changed to a relative path from this comment. #4688 (comment)

I think it makes sense that if anyone chooses to shade armeria, the option will still work as long as the directory structure doesn't change.

Give a more meaningful name?

Done 👍

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 💯 👍 @jrhee17

@ikhoon
Copy link
Contributor

ikhoon commented Mar 24, 2023

The CI builds failed.

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a lot!
I also learned a lot from the discussion.

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, @jrhee17! 🙇

@trustin trustin merged commit be306ab into line:main Mar 27, 2023
ikhoon added a commit to ikhoon/armeria that referenced this pull request Mar 13, 2024
Motivation:

In old Thrift versions (< 0.9.3), the multi-thread environment was not
considered during the initialization process for the Thrift class.
A workarournd for that was committed at line#4688, but it only applied to
DocService.

This problem can occur not only in `DocService` but also when creating
Thrift clients and other parts, so it would be desirable to use
`ThriftMetadataAccess.getStructMetaDataMap()` for places where
`FieldMetaData.getStructMetaDataMap()` is used.

```java
// When creating Thrift clients
java.lang.IllegalArgumentException: failed to retrieve function metadata: ...
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:239)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.lambda$init$2(ThriftServiceMetadata.java:117)
	at java.base/java.util.HashMap.forEach(HashMap.java:1337)
	at java.base/java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.init(ThriftServiceMetadata.java:116)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.<init>(ThriftServiceMetadata.java:85)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.metadata(THttpClientDelegate.java:216)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:122)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:78)
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:104)
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:66)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:229)
	... 33 common frames omitted

// When creating DocService
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newStructInfo(ThriftDescriptiveTypeInfoProvider.java:322)
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newDescriptiveTypeInfo(ThriftDescriptiveTypeInfoProvider.java:101)
	at com.linecorp.armeria.server.docs.DocService$SpecificationLoader.lambda$composeDescriptiveTypeInfoProvider$9(DocService.java:386)
```

Referece:
- https://issues.apache.org/jira/browse/THRIFT-1618
- apache/thrift@4a78c6e

Motifications:

- Replace `FieldMetaData.getStructMetaDataMap()` with
  `ThriftMetadataAccess.getStructMetaDataMap()` to thread-safely
  initialize Thrift classes

Result:

You no longer see `NullPointerException` when creating Thrift clients in
a multi-threaded environment.
minwoox pushed a commit that referenced this pull request Mar 18, 2024
…5497)

Motivation:

In old Thrift versions (< 0.9.3), the multi-threaded environment was not considered during the initialization process for the Thrift class. A workaround was committed at #4688 but only applied to DocService.

This problem can occur not only in `DocService` but also when creating Thrift clients and other parts, so it would be desirable to use `ThriftMetadataAccess.getStructMetaDataMap()` for places where `FieldMetaData.getStructMetaDataMap()` is used.

```java
// When creating Thrift clients
java.lang.IllegalArgumentException: failed to retrieve function metadata: ...
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:239)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.lambda$init$2(ThriftServiceMetadata.java:117)
	at java.base/java.util.HashMap.forEach(HashMap.java:1337)
	at java.base/java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.init(ThriftServiceMetadata.java:116)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.<init>(ThriftServiceMetadata.java:85)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.metadata(THttpClientDelegate.java:216)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:122)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:78)
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:104)
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:66)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:229)
	... 33 common frames omitted

// When creating DocService
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newStructInfo(ThriftDescriptiveTypeInfoProvider.java:322)
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newDescriptiveTypeInfo(ThriftDescriptiveTypeInfoProvider.java:101)
	at com.linecorp.armeria.server.docs.DocService$SpecificationLoader.lambda$composeDescriptiveTypeInfoProvider$9(DocService.java:386)
```

Reference:
- https://issues.apache.org/jira/browse/THRIFT-1618
- apache/thrift@4a78c6e

Modifications:

- Replace `FieldMetaData.getStructMetaDataMap()` with `ThriftMetadataAccess.getStructMetaDataMap()` to thread-safely initialize Thrift classes.

Result:

You no longer see `NullPointerException` when creating Thrift clients in a multi-threaded environment.
ikhoon added a commit that referenced this pull request Mar 26, 2024
…5497)

Motivation:

In old Thrift versions (< 0.9.3), the multi-threaded environment was not considered during the initialization process for the Thrift class. A workaround was committed at #4688 but only applied to DocService.

This problem can occur not only in `DocService` but also when creating Thrift clients and other parts, so it would be desirable to use `ThriftMetadataAccess.getStructMetaDataMap()` for places where `FieldMetaData.getStructMetaDataMap()` is used.

```java
// When creating Thrift clients
java.lang.IllegalArgumentException: failed to retrieve function metadata: ...
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:239)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.lambda$init$2(ThriftServiceMetadata.java:117)
	at java.base/java.util.HashMap.forEach(HashMap.java:1337)
	at java.base/java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.init(ThriftServiceMetadata.java:116)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.<init>(ThriftServiceMetadata.java:85)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.metadata(THttpClientDelegate.java:216)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:122)
	at com.linecorp.armeria.internal.client.thrift.THttpClientDelegate.execute(THttpClientDelegate.java:78)
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:104)
	at com.linecorp.armeria.internal.common.thrift.ThriftFunction.<init>(ThriftFunction.java:66)
	at com.linecorp.armeria.internal.common.thrift.ThriftServiceMetadata.registerFunction(ThriftServiceMetadata.java:229)
	... 33 common frames omitted

// When creating DocService
Caused by: java.lang.NullPointerException: null
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newStructInfo(ThriftDescriptiveTypeInfoProvider.java:322)
	at com.linecorp.armeria.internal.server.thrift.ThriftDescriptiveTypeInfoProvider.newDescriptiveTypeInfo(ThriftDescriptiveTypeInfoProvider.java:101)
	at com.linecorp.armeria.server.docs.DocService$SpecificationLoader.lambda$composeDescriptiveTypeInfoProvider$9(DocService.java:386)
```

Reference:
- https://issues.apache.org/jira/browse/THRIFT-1618
- apache/thrift@4a78c6e

Modifications:

- Replace `FieldMetaData.getStructMetaDataMap()` with `ThriftMetadataAccess.getStructMetaDataMap()` to thread-safely initialize Thrift classes.

Result:

You no longer see `NullPointerException` when creating Thrift clients in a multi-threaded environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants