Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Application runtime exception in IotHubTransport. #1718

Closed
semeneganov-tomtom opened this issue May 25, 2023 · 4 comments
Closed
Assignees
Labels
bug fix checked in Fix checked into main or preview, but not yet released. IoTSDK

Comments

@semeneganov-tomtom
Copy link

Context

  • OS and version used: Android 11 (API 30)
  • Java runtime used: 1.8
  • SDK version used: 2.1.5

Description of the issue

While working throws runtime exception:

java.lang.NullPointerException: Attempt to invoke virtual method 'long java.lang.Long.longValue()' on a null object reference
	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.checkForOldMessages(IotHubTransport.java:851)
	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.lambda$sendMessages$0$com-microsoft-azure-sdk-iot-device-transport-IotHubTransport(IotHubTransport.java:734)
	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport$$ExternalSyntheticLambda1.run(Unknown Source:2)
	at java.lang.Thread.run(Thread.java:923)

Happens frequently.
MQTT connection is used.

Code sample exhibiting the issue

Just as sample, nothing wrong, I belive, in here.

public void test() throws IotHubClientException, InterruptedException {
    ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
    DeviceClient device = new DeviceClient("<<ConnectionString>>", IotHubClientProtocol.MQTT);
    device.open(false);
    Runnable task = () -> {
        try {
            device.sendEvent(new Message("poison"));
        } catch (IotHubClientException|InterruptedException e) {
            System.out.println("Send exception: " + e.getMessage());
        }

        try {
            device.open(false);
        } catch (Exception e) {
            System.out.println("Ignore exception");
        }
    };
    executor.scheduleAtFixedRate(task, 0, 10, TimeUnit.MILLISECONDS);

    Thread.sleep(100000);

	// Then turn off/on the connection and wait till it happen
}

Console log of the issue

Remove partly because of legal and privacy.
Obviously they are not going to show you the whole run.

05-03 18:30:42.663  4474 17314 I IotSdkDeviceClient: CONNECTION STATUS UPDATE: CONNECTED
05-03 18:30:42.663  4474 17314 I IotSdkDeviceClient: CONNECTION STATUS REASON: CONNECTION_OK
05-03 18:30:42.663  4474 17314 I IotSdkDeviceClient: CONNECTION STATUS ERROR MESSAGE: null
05-03 18:30:42.663  4474 17314 I ConnectionChangeCallback: The connection was successfully established. Can send messages.
05-03 18:30:42.664  4474 17314 I DeviceClient: Device client opened successfully
05-03 18:30:42.664  4474 17314 I DeviceProperties: DeviceProperties: Setup device properties
--------- beginning of crash
05-03 18:30:42.701  4474 17346 E AndroidRuntime: FATAL EXCEPTION: Thread-251
05-03 18:30:42.701  4474 17346 E AndroidRuntime: Process: ______, PID: 4474
05-03 18:30:42.701  4474 17346 E AndroidRuntime: java.lang.NullPointerException: Attempt to invoke virtual method 'long java.lang.Long.longValue()' on a null object reference
05-03 18:30:42.701  4474 17346 E AndroidRuntime: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.checkForOldMessages(IotHubTransport.java:851)
05-03 18:30:42.701  4474 17346 E AndroidRuntime: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.lambda$sendMessages$0$com-microsoft-azure-sdk-iot-device-transport-IotHubTransport(IotHubTransport.java:734)
05-03 18:30:42.701  4474 17346 E AndroidRuntime: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport$$ExternalSyntheticLambda1.run(Unknown Source:2)
05-03 18:30:42.701  4474 17346 E AndroidRuntime: 	at java.lang.Thread.run(Thread.java:923)
05-03 18:30:42.701  4474 17346 D MonitoringInstr: Handling an uncaught exception thrown on the thread Thread-251.
05-03 18:30:42.701  4474 17346 D MonitoringInstr: java.lang.NullPointerException: Attempt to invoke virtual method 'long java.lang.Long.longValue()' on a null object reference
05-03 18:30:42.701  4474 17346 D MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.checkForOldMessages(IotHubTransport.java:851)
05-03 18:30:42.701  4474 17346 D MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.lambda$sendMessages$0$com-microsoft-azure-sdk-iot-device-transport-IotHubTransport(IotHubTransport.java:734)
05-03 18:30:42.701  4474 17346 D MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport$$ExternalSyntheticLambda1.run(Unknown Source:2)
05-03 18:30:42.701  4474 17346 D MonitoringInstr: 	at java.lang.Thread.run(Thread.java:923)
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: An unhandled exception was thrown by the app.
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: java.lang.NullPointerException: Attempt to invoke virtual method 'long java.lang.Long.longValue()' on a null object reference
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.checkForOldMessages(IotHubTransport.java:851)
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.lambda$sendMessages$0$com-microsoft-azure-sdk-iot-device-transport-IotHubTransport(IotHubTransport.java:734)
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport$$ExternalSyntheticLambda1.run(Unknown Source:2)
05-03 18:30:42.702  4474 17346 W AndroidJUnitRunner: 	at java.lang.Thread.run(Thread.java:923)
05-03 18:30:42.709  4474 17346 I AndroidJUnitRunner: Bringing down the entire Instrumentation process.
05-03 18:30:42.710  4474 17346 E MonitoringInstr: Exception encountered by: Thread[Thread-251,5,main]. Dumping thread state to outputs and pining for the fjords.
05-03 18:30:42.710  4474 17346 E MonitoringInstr: java.lang.NullPointerException: Attempt to invoke virtual method 'long java.lang.Long.longValue()' on a null object reference
05-03 18:30:42.710  4474 17346 E MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.checkForOldMessages(IotHubTransport.java:851)
05-03 18:30:42.710  4474 17346 E MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.lambda$sendMessages$0$com-microsoft-azure-sdk-iot-device-transport-IotHubTransport(IotHubTransport.java:734)
05-03 18:30:42.710  4474 17346 E MonitoringInstr: 	at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport$$ExternalSyntheticLambda1.run(Unknown Source:2)
05-03 18:30:42.710  4474 17346 E MonitoringInstr: 	at java.lang.Thread.run(Thread.java:923)

The investigations

Extremely hard to reproduce. Manually I didn't have luck to do it.
But it seems that is because of the concurrency.

What happened in the rows of IotHubTransport.java:851:

if (System.currentTimeMillis() - correlationStartTimeMillis.get(correlationId) >= DEFAULT_CORRELATION_ID_LIVE_TIME)

The only call here can bring null is correlationStartTimeMillis.get(correlationId) which can cause the NPE in above code.

And the concurrently running thread that will remove the entry can lead on this rows:

new Thread(() ->
	{
		correlationCallbacks.remove(correlationId);
		correlationCallbackContexts.remove(correlationId);
		correlationStartTimeMillis.remove(correlationId);
	}).start();
@timtay-microsoft
Copy link
Member

This does look like a bug. We'll try to get this fixed soon.

timtay-microsoft added a commit that referenced this issue May 26, 2023
We don't want cases where one thread is calling callbacks.get("some id") while another just removed that entry.

#1718
@timtay-microsoft timtay-microsoft self-assigned this May 26, 2023
timtay-microsoft added a commit that referenced this issue May 31, 2023
- Make correlation callback cleanup job run less frequently
  - Previously ran several times per second. Now runs once per hour
- Make correlation callback operation blocks atomic
  - We don't want cases where one thread is calling callbacks.get("some id") while another just removed that entry.

#1718
@timtay-microsoft timtay-microsoft added the fix checked in Fix checked into main or preview, but not yet released. label May 31, 2023
@semeneganov-tomtom
Copy link
Author

Thank you @timtay-microsoft for the quick response to the issue.
But I've got couple of question about it, can you help with them?

  1. When approximately I can expect this bug fix to be released?
  2. When to close the issue, after release will be done?

@timtay-microsoft
Copy link
Member

  1. I'm hoping to get a release out with this fix within this week
  2. I'll close this issue thread once the fix has been released

@timtay-microsoft
Copy link
Member

The release is now complete. This bug is fixed as of DeviceClient version 2.2.0. See release notes for additional details.

I'll go ahead and close this thread, but feel free to open a new thread if you encounter any further issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix checked in Fix checked into main or preview, but not yet released. IoTSDK
Projects
None yet
Development

No branches or pull requests

2 participants