-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is 'old-heap' growing by memory-leak? #7228
Comments
@rdelangh , as usual, we need heap dump. the best, of course, to take it after each full GC. I do not remember is it possible for you to send heap dump for us, if not could you send us screenshots of heap dumps ? |
hi @Laa, "to take it after each full GC" -> how can I control when a FGC is happening? Because you suggest that I would then initiate (manually) a heap dump. And indeed, the heap dump will be around Xmx12G size, so 12GB, and impossible to send it to you. Can you suggest which tool I shall use to open such a big heap dump, and command options that will allow to open the big (>= 12GB) file, and which screenshots you desire? |
Is this usefull?
|
attached is the output of "jmap -histo " , hope it helps |
To see how objects are growing, take a look at a sequence of ascii output from jmap -histo:live (this will force a fullgc and then collect histogram): https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jmap.html . |
@cmassi , @robfrank , @Laa
from repetitive 'jstat' outputs, I saw a continuous increase of the OC at each fullgc, until finally an increase after fullgc 11 :
same after fullgc 12:
Will keep an eye on it... |
We see from latest jstat output that with Full GC number 12, the Old Usage has decreased, together with Old+Eden Capacity, but Eden was nearly full. |
@cmassi Now, with Xms=4g and Xmx=12g, I hit again OOM under heavy load, the ODB engine has become inaccessible:
Shall I set those parameters that you mentioned? I.e.
|
The documentation of XX flags for Hotspot jvm is available on Oracle web site, not in OrientDB documentation. |
Did that, the server is running quite fine with reasonable (but not at all maximum required) load. Going to stress the server a bit more now. |
The server is running with the following command-line args:
"jstat" outputs are:
The server itself still has quite some unused RAM (installed RAM=128GB):
However, sometimes I can not start a new ODB client program "console.sh":
-> which of the JVM settings should be further adjusted? Is this connection from the client program trying to consume additional 'new' generation space? |
Hi @rdelangh , what is output of ulimit -a ? |
Could you also execute |
|
@rdelangh did you take this values at the moment after you have seen a problem with the last OOM? |
trying again to launch an extra JVM program ("jstat"), fails with
At that very moment, only 2 Java processes are active on this machine:
Only when I "kill" that Java client program, the "jstat" command can be launched again:
but from the output of "jstat", it looks like there is plenty of heap space available, no? |
Hi @rdelangh OS limits an amount of threads but not amount of processes. Also threads are created outside of heap memory. The method which you used to run jstat instance suggests me that you hit this limit. Lets check total amount of threads running in your system currently. Could you execute |
@Laa
|
Hm, @rdelangh very strange. Could you print output of Could you also do it once you have got the error mentioned above? |
hi @Laa
The only additional program that I sometimes run on this server host, is a "jstat" command, or a "console.sh" (which in turns launches "java") |
@rdelangh I have already written you before. It is called a number of processes, but in Linux terminology, it is some lightweight processes aka threads. If you noticed in command which I provided there is flag
As you can see we have 31 thread in a process. Each line corresponds to a single thread of the process. Could you execute a script which I provided to you when you will have this issue? |
hi @Laa, sorry for my ignorance, I know very well what the LWP are, but I was reading your comment too quickly.
|
hello,
The only solution possible is a "shutdown", then a restart. I raise now the heap setting further to "-Xmx24G" -> sure that this is not a case of memory leakage?? |
Your jvm has executed 38 fullgc and old is nearly full again. The last 3 lines do not show any gc (238564) or fullgc (38) increased number, and so nothing is cleaned (OU is stable because they are not fullgc), but Eden usage increases each time of 100m. If you enable detailed gclog and you see something like "java.lang.OutOfMemoryError: GC overhead limit exceeded" , then it could be you do not have hw resources to handle this huge heap cleanup (need more cpu), or there is nothing else to clean (so you need more heap or there is a leak). From jstat I can only see that average time, which seems to be fine (111.985 secs/38=2.94 secs). You can check and try to reduce promotion of objects (or keep it) to a small value, tuning New generation, with the best size of Survivors and Eden (add PrintGCDetails and PrintTenuringDistribution). The OU gc shows promoted objects (I see only 500k promoted in first line, but you should look at jstat log for a long time). If your jvm do not survive with 24 gb, it could be a leak, and you can collect histogram (jmap -histo:live) to see which objects continously increase (note: the live collection of histogram requires an additional fullgc). |
@taburet : yes, extensively using Lucene indexes
|
AFAIU you create 6 different single field lucene indexes for each class. http://orientdb.com/docs/last/Full-Text-Index.html#lucene-writer-fine-tuning-expert I don't know if the latter would increase performance/decrease ram occupation,but for sure a single index per class should be better. The only backdraw is that maybe you need to rewrite some queries, just to qualify the field name(s) if you are using different analyzers. |
Moreover, apart from reducing the memory-usage by using few multi-field Lucene indexes instead of several single-field Lucene indexes, how will this avoid that the Old-heap keeps on growing as we see, until an OOM? |
What I noticed in the past days, is the following:
I could try to raise the memory limits of the ODB server even a little higher from currently max 24GB to -say- 30GB, but maybe this will cause very long garbage collections of a few/many seconds On top of all this, it is still not possible to spread the load over more than a single hardware server. See issue #6666 (open for months!) |
I have again this situation where the server is freezing, altough: I have no clue what the ODB server engine is doing now, regarding this dbase "mobile":
-> what else can I do to find out why the ODB server is stuck for inserting records to one database ("mobile"), while it happily allows inserting records to another dbase? |
And after some 20 minutes of total freeze of data-loading program into dbase "mobile", it finally started with its first records. Its speed is abnormally slow, however. |
@rdelangh so right now the speed of freeze is very slow right at this moment? |
Looking at your last jstat -gc report you have had 6 fullgc with a FGCT of 14.630 secs. |
Meanwhile I did not see any reasonably fast solution apart from migrating this database "mobile" onto another, separate hardware server named "orient1". The "jstat" output at this moment:
The commands you gave can only partially run successful:
Where would I get this additional GC logging details? The output of "jstat" is still showing the same output columns:
|
Sorry, |
There is also the opposite option: -XX:+PrintGCApplicationConcurrentTime |
hi @cmassi , thx a lot for your time ! Highly appreciated because we are really struggling with ODB under high load.
It seems that these options can only be set in the command-line when starting up the ODB server. |
I've added to the above list also the flag to activate from jinfo (which is not needed at startup): jinfo -flag +PrintGC pid_jvm |
I have not done any restart of the server process yet (need to wait a timeslot later today), but I get meanwhile the following GC log messages in the output, every 5 secs or so:
|
Please remember to add: jinfo -flag +PrintGC pid_jvm |
|
The numbers are total_allocation_before->total_allocation_after (total capacity) |
May I close this issue? Does the "laziness" of lucene indexes improved the situation? |
hello Frank, absolutely positive impact of the "laziness" of Lucene indexes, many thanks for that! We still encounter hanging server processes when -for example- some queries have been launched which are trying to access too much records. I guess that causes an OOM situation, but the server logfiles do not mention anything about the fact that they are stuck : no more client processes (console, or REST-api) can connect, the clean "shutdown" fails to get a connection, a gentle "kill" signal is not trapped... |
OrientDB Version: 2.2.18
Java Version: 1.8.0_92
OS: Ubuntu-16.04
Expected behavior
When we have 'not-so-heavy' data loading active on our standalone server, I notice that the Old-generation size is growing with each Full Garbage Collection that is initiated:
Our server is started with heap size parameters "java -server -Xms4G -Xmx12G ..."
However when we start an additional, heavy data-loading program, that old-generation capacity very quickly becomes exhausted until the server dies/halts with Out-Of-Memory.
-> is this growing Old-generation Capacity ("OC" column) figure normal ?
-> a Full-GC is not reducing the Old-generation Usage ("OU" column), so the "OC" is further increasing at each FGC
The text was updated successfully, but these errors were encountered: