-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addUserRecord call throws DaemonException #39
Comments
I received the same error in production yesterday on a service that has been running smoothly for months... 2016-01-27T00:10:10,299 ERROR [com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask] ShardId shardId-000000000000: Application processRecords() threw an exception when processing shard |
I'm running a lambda function that uses the KPL, and it appears to crash with this exception when the lambda runs out of memory |
We keep running into similar issues when using the KPL library on AWS Lambda.
|
Thanks for reporting this. We are investigating this, but could use some additional information. We will investigate adding memory usage tracking for the KPL native process to help determine how much memory it's consuming. Can everyone who is affected by this please respond or add a reaction to help us reaction to help us prioritize this issue. |
We upgraded to 0.12.3 in our production two days ago, but since then the occurrence of this issue is much frequent than previous 0.10.2 version. (Please note as part of the upgrade, we also upgraded AWS SDK from 1.10.49 to 1.11.45.) We have 10 m4.2xlarge instances in eu-west-1 that produce a high amount of traffic to our Kinesis stream in us-west-2, and they are the ones that frequently run into this issue. Other producer instances in other regions seem to be fine, but they have less traffic. Our motive of upgrade is to hope it can solve the memory leak issue, but it looks like the stability has degraded instead. I will open a separate support ticket to give more information. |
Facing similar issue at our side: we are using amazon-kinesis-producer - v0.10.2 and amazon-kinesis-client v1.7.0, aws java sdk v1.11.77. it issue is intermittent and I'm unable to figure out the root cause. |
Well we moved away from AWS KPL and are using AWS SDK for Java now to stream data to Kinesis. It didn't work out for us in the end as we saw those errors quite frequently. |
On a few occasions we ran into the crashed KPL, we also observed our JVM cannot create more threads, even though the thread count in JVM is very stable. Exception in thread "qtp213045289-24084" java.lang.OutOfMemoryError: unable to create new native thread I wonder whether that's because the KPL process has somehow prevented its parent JVM process to get more native threads. |
We see this on calling flushSync method as well. We initially thought the host do not have enough memory for KPL to do its job. But even after increasing the memory, we continue to see this.
|
want to point out one observation, not sure how helpful this will be... and following is log (which point to file/pipe \.\pipe\amz-aws-kpl-in-pipe- not found )
|
We plan to use this library for high workloads but it looks like it doesn't prevent the native process to crash in high workloads. When the process is dead, our application stops working, it would be great if you could focus on this issue. |
@sshrivastava-incontact what you're seeing is related to running the KPL 0.12.x on Windows which isn't currently supported. For those running on Linux, and Mac OS X: The newest version of the KPL includes some additional logging about how busy the sending process is. See the Release Notes for 0.12.4 on the meaning of the log messages. Under certain circumstances the native component can actually run itself out of threads, which will trigger a failure of the native process. |
When can this KPL works with Windows? I am OK with version 0.12.* does not work with Windows. However, the version 0.10.* also does not work, since I always got this Exception: And it is not only me, this guy also got same problem: Looking forward to your update and thank you so much! Sincerely |
I am using Kinesis Producer 0.10.2. It was running fine on Windows 7 and when I try to set it up on CentOS 6.5, I am getting the error below: Error in child process Can anybody help me on this? Thanks. |
Getting the same exception while trying https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/src/com/amazonaws/services/kinesis/producer/sample/SampleProducer.java
|
Seems like switching to root on Linux & running the example: SampleProducer.java showed some puts going through before it fails. Environment: KPL v 0.12.5 on Linux (Linux els-d93322 2.6.34.7-66.fc13.i686.PAE #1 SMP Wed Dec 15 07:21:49 UTC 2010 i686 i686 i386 GNU/Linux) Looking at this error line suggests that it is trying to extract binaries to /tmp: So I swithced to root & now it's at least doing 2 puts & then fails again with the same exception. But at least it did 2 puts: |
Regarding the exception above, it was my mistake. I was attempting to write to a Kinesis Stream while the actual endpoint was a Kinesis Firehose. After making that correction, I was able to successfully write to Kinesis Firehose using this example: https://github.com/awslabs/aws-big-data-blog/blob/master/aws-blog-firehose-lambda/kinesisFirehose/src/main/java/com/amazonaws/proserv/lambda/KinesisToFirehose.java |
Any chance anyone has any workarounds? The issue looks similar: Is the only known workaround use the native Java producer? It might be a good idea to update the README as it is a pretty nasty surprise when the claim is made that 0.10.2 works on windows. |
From what we have observed, it's not a bug, but by-design behaviour. The thing is, message to kinesis is being stored (put operation) on blocking queue, which has This operation is being made on the calling thread in the KPL code, so, if there is a possibility for the calling thread to be interrupted, catch logic would be invoked: /**
* Enqueue a message to be sent to the child process.
*
* @param m
*/
public void add(Message m) {
if (shutdown.get()) {
throw new DaemonException(
"The child process has been shutdown and can no longer accept messages.");
}
try {
outgoingMessages.put(m); //<-- HERE
} catch (InterruptedException e) {
fatalError("Unexpected error", e);
}
} Actually, private synchronized void fatalError(String message, Throwable t, boolean retryable) {
if (!shutdown.getAndSet(true)) {
if (process != null) {
process.destroy(); //<-- HERE PROCESS IS DESTROYED
}
try {
executor.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) { }
executor.shutdownNow();
// other code
}
} So, to workaround this issue make sure you are not invoking This piece of code did the thing for us, make sure something similar is happening in your code: private void sendToKinesis(ByteBuffer buffer) {
CompletableFuture.runAsync(() -> {
try {
streamProducer.addUserRecord(...)
} catch (Throwable e) {
log.error("Failed to send to Kinesis: {}", e.getMessage());
}
}); // run on separate thread pool
} |
We are seeing this same issue with version 0.10.2 on Linux. I do not really have a root cause. Is there any work-around for this issue? Has anyone tried the above proposed work-around? Thanks. |
@spjegan 0.10.x is no longer supported, and there is a required upgrade to 0.12.x. In 0.12.x automatic restarts of a failed Daemon process were added. |
@head-thrash In 0.12.x there was higher level automatic restart added, that should restart the native process if it exits. During the process exit you're correct that you will receive the exception, but after the process is restarted publishing should be available. |
@ppearcy Windows support was added in Release 0.12.6. |
@udayravuri During startup the library attempts to extract the native component to the temp directory. The user your application is running as, must have write access to the directory. You can use |
@pfifer thanks |
@pfifer Still, the method returns ListenableFuture, which leads to a false assumption that all the exceptions will be thrown upon the future execution. Developers create |
I was trying to run the KPL on an |
Unfortunately the KPL is built against glibc while Alpine Linux uses musl libc. This causes the native component to fail runtime linking, and crash. There appears to be some Docker images that include glibc, but I can't vouch for whether they would work or not. |
Hi, `// put records and save the Futures
This code works fine for most of the time. I just ran a load with millions of records and i see that hundreds of these failed with below error:
at the code line f.get() Can you please suggest something as a workaround? I am using all default configurations for KPL other than below: |
Also, in addition to above, i am seeing below errors: java.lang.RuntimeException: EOF reached during read |
Also: com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages. |
I have been able to cause this problem by calling the KPL from a catch block for an InterruptException in a child thread. I was using the KPL for logging and attempted to log when a child thread was interrupted from the catch block. Unfortunately the KPL has a shutdown hook that runs when the thread it was called from exits, not the thread where the KPL was instantiated. The hook kills the daemon process that writes events to the shards. I would say this is a KPL bug and that they should only kill the daemon when the parent thread of the KPL object is killed. |
Getting this frequent failures. 0.12.9 jar Windows 2012 R2 Standard VMWare Intel 2GHZ (4 processors) 6 GB RAM. Note: All Kinesis default config used. |
Worked fine in local. But getting heap size issues running in docker container linux. How much heap this needs? |
Any updates or workarounds for this issue? still facing this with 0.12.9.
|
@pfifer I get the following error : Which version of amazon-kinesis-producer jar (aka KPL jar) for windows 10 should I be using ? |
Also tried using version 0.10.2 , same program and configuration as above: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". I don't encounter this "certificate verify failed" with higher versions of producer (example 12.x ) , why so ? |
Same thing here. 12.11, no errors are thrown but no logs are created either on my S3 bucket:
But If I instead wait for callbacks like below, I ge the following errors:
I have tried older versions such as 10 but they cannot connect at all, I get connection errors. |
I can reliably reproduce the error with a very short code snippet. Two points to note: the first request throws the exception, subsequent requests work. This behavior happens in a docker container running various opensdk images (amazoncorretto-8, alpine-opensdk-8, etc). It does not happen outside of docker on my laptop running Java Hotspot. |
I'm trying a really simple example and finding this issue with Scala 2.13.0, Java 1.8 and KPL 0.12.11:
Could anyone find a fix for this issue? This issue seems to be entirely reproducible and many people seem to have faced it. Is there any official roadmap for this tool to be fixed? |
We are facing a similar issue on KPL 0.12.11, I am following this up if anyone has some ideas on how to get around of it and have tried to set DLQ for it. Thanks! |
I faced this issue and amazon tech support helped me debug it. I don't know scala but the java version of the api returns a future from the adduserrecord method. By inspecting that return object, you can learn why the add failed. In my case it was because I forgot to add credentials to my container which prevented the KPL daemon from connecting to the queue. That can cause this exception to be thrown when you attempt to add a record. |
Hello, Thank you everyone for sharing your experience and learning with the community. For an example of how to implement this, see the KPL sample application in this repository, specifically this line. (this is for a test application, and in this case, it just shuts down the sample application after displaying the underlying failure) This is a general failure condition that occurs when there is any unresolvable configuration problem with the KPL. Usually when this happens it is for one for the following reasons:
If you are experiencing this problem and can confirm that it is not due to a configuration/access, please re-open the issue and provide more details on your configuration and if any reproduction steps are consistently successful, including steps about stream creation, iam users/roles/permissions, container/ec2 instance, etc. For additional assistance, you may also open a customer support ticket through the AWS console to receive more specific support. |
I followed the "Barebones Producer Code" and got this exception when calling The same setup worked using KCL (1.x). I will be trying KCL 2.x now. |
@namedgraph Try removing the loop from the sample code and write a single record with the callback above. If this doesn't work, please respond with more details. (Full code, IAM permissions used, etc). I'm not sure why you are concerned about KCL version here. There aren't any version dependancies between KPL and KCL. |
Any update on this? I've just ran into similar issues... Is the official recommendation to move out of KPL and adopt the SDK directly? |
Repeating from above:
@wikier , Additionally, this can happen if the KPL process gets overwhelmed by a lack of backpressure successfully implemented by the customer. I highly recommend reading this blog post to understand some of the considerations for how to configure and use the KPL: |
We are facing this issue when we write a high volume of data to the stream. I don't think this issue is related to the configuration and access issues. Credentials could not be found We have also implemented backpressure and specifically flushing the records when outstanding records reach the maximum threshold configured by the application. Whenever this error occurs CPU usage goes high. Already there is AWS support ticket to address this issue. |
@srinihacks The high CPU problem is also known, see #187. It's the reason why we moved away from KPL in the end and towards Kinesis Aggregation Library + Kinesis Client. This did simplify a lot of things for us. |
I am using m1 macbook and java 8 for m1. Can you any advice? |
Sometimes calling addUserRecord starts to throw:
The KPL does not seems to recover from this. All further calls to
addUserRecord
also fail. Restarting the KPL java process fixes the situation.This seems to happen when the kinesis stream is throttling requests so my guess is that the native process cant write to the stream quickly enough and runs out of memory. If that's the case my expectation would be that the native process should start to discard older data and of course that if the native process dies the KPL recovers to a working state.
The text was updated successfully, but these errors were encountered: