-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory usage by Timeline chart with very large cases #1598
Comments
Wow, that's a ton of memory... I just tested this feature with medium size cases of 2-3 million items, my fault... IPED should be able to handle cases up to ~135 million items. Beyond that, the main table internal height exceeds the maximum value for an Integer (although that is another issue, we can simply warn the user to apply some filter, he probably doesn't want to analyze 135million items on the table at the same time)
Not sure if cache building could be faster, but consuming all available memory for sure is not desired/expected. Could you take a heap dump to help identify the memory hungry class?
Out of curiosity, what version has this heap usage? And the other? I expect 4.0.x to use less memory than 3.18.x because of Lucene 5.x to 9.x upgrade. |
I think this is high priority, version 4.1 should be as stable as previous versions. @patrickdalla could you take a look at this? |
The 8 GB I mentioned is with 4.2-snapshot. |
The first call seems to be synchronous. I have no idea if this is mandatory, but it seems a long wait comparing to other initialization steps.
No, but I can take tomorrow. |
It was just done to try to speed up the Day cache, to don't share CPU resources with other caches building, as the Day scale is the default and rendered first, maybe that is not needed... |
Right. |
@tc-wleite The first time a case is opened, this cache is built, but the next times it is loaded from disk. As the log says, it was successfully built. So, this excessive memory consumption occurs only the first time, or in every opening? |
In fact, what is being created is an time period index. I don't remember well why this was not made to be executed in processing time. I think that there was some constraint about timezone change. |
I am not sure about the next executions after the first cache is built. As it was taking a while to open the case and I had to access it just to export some files, I disabled the cache creation. I will test and report back here. |
AFAIK it was to keep backwards compatibility with 4.0.x cases and also to work transparently with multicases.
Usually when there isn't enough heap memory and high GC pressure, the UI starts to freeze for some while before an OutOfMemoryError is thrown. |
If all of them are changed to run sequentially, do they finish? Could you check what is the final heap usage? |
I am going to verify. |
I changed the code to make all cache building steps run sequentially, but they didn't finish after a bit more than 2 hours.
|
Thank you for testing @tc-wleite!
This seems bad, maybe the final (not the temporary) memory needed is too excessive... |
I'm working on this. |
The use of primitive could help, but the problem is that all data of timeperiods counting is loaded and maintained in memory, what should be solved. |
To use bitsets in this case, we should have one bitset per Timeperiod,event tuple. So, lots of bitsets (most of them small in size).
|
Possibly you have already thought about this, but keeping just the current used TimePeriod scale (Day, Hour...) on memory maybe would be enough? Or too slow to load on demand? I just ask you to try to be careful about the changes, since I plan to include them in 4.1.2, thank you! |
One thing that could help is taking a heap dump and check what is really memory hungry, to avoid wasting time trying to optimize unrelated things. |
After some tests I noticed that ItemIds lists of every shown item were been kept in memory to be used when the user right click on a bar to select or mark its respective items. |
Thank you @patrickdalla for investigating this! @tc-wleite if you could take a heap dump of your huge case and send it to me privately, I can take a look to try to confirm what is consuming all the heap in your case. I usually use the Eclipse Memory Analyzer plugin for it, it can handle huge dumps not handled by JVisualVM and it has very nice features to investigate issues like this. |
I pushed all the commits to branch TimelineInteractionFixes. I think that some optimizations can still be made. But, at least, what could be classified as "bug" was solved. |
Thank you very much @patrickdalla!
Have you tried this? Was it bad for UI responsiveness? If @tc-wleite case still triggers OOME with 32GB heap, maybe we can expose a simple option in conf/AnalysisConfig.txt to disable the feature for huge cases like that. |
creations leading to errors if tab visibility is changed multiple times
I have a very large case here, processed with the master branch, with ~125 million items (many evidence files processed with "fastmode" profile).
I had similar cases processed with 3.18.x and 4.0.x and they opened relatively quickly (~2 minutes).
With the latest version, it is taking ~15 minutes, with a long wait in "Starting to load/build time cache of [[class org.jfree.data.time.Day]]", as can be seen in the log below.
After that, the analysis UI becomes visible, but unusable (any mouse/keyboard action makes the application "Not responding" in Windows). It seems all the JVM memory is consumed and the garbage collector is running continuously (screenshot of VisualVM below).
Disabling the load/build of the timeline cache (in the source code) makes the loading process fast again, and it takes a bit less than 8 GB to load everything.
Is there anything that can be done about this?
Although such large cases are not common, new IPED versions may create more items (depending on the processing options).
Maybe putting a limit in the maximum number of items that this cache building process will run. Or maybe an option to disable it, to be used when the user needs to open a very large case.
The text was updated successfully, but these errors were encountered: