Throws an error when I try to _read a large file #255

GreyCat · 2017-09-13T17:45:42Z

Moved from #1196, original by @bulbum.

Good evening!
I use Kaitai to handle pcap files. And when I try to load 114 MB log:
Pcap p = Pcap.fromFile("material\\log114MB.pcap");
it throws
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
	at com.company.EthernetFrame._read(EthernetFrame.java:65)
	at com.company.EthernetFrame.<init>(EthernetFrame.java:41)
	at com.company.Pcap$Packet._read(Pcap.java:283)
	at com.company.Pcap$Packet.<init>(Pcap.java:266)
	at com.company.Pcap._read(Pcap.java:164)
	at com.company.Pcap.<init>(Pcap.java:144)
	at com.company.Pcap.fromFile(Pcap.java:21)
	at com.company.View.toParseFromFile(View.java:180)
	at com.company.View.access$100(View.java:27)
	at com.company.View$1.mouseReleased(View.java:88)
However, when I load a 8 KB file, everything is fine.

I tried to increase the max heap size using command line option to JVM. But increasing to some finite max value does not ultimately get rid of the issue.

The text was updated successfully, but these errors were encountered:

GreyCat · 2017-09-13T17:57:16Z

It's somewhat complicated issue, which should be started to be dealt with in kaitai-io/kaitai_struct_compiler#133, but it's still a good way to go.

For a while, I could recommend the following workaround. pcap format literally consists of the header and a long list of packets. If you don't need them all at once in the memory, then it can be done like that:

Modify seq from top-level type in pcap.ksy to be like that (i.e. remove packets):

seq:
  - id: hdr
    type: header

Recompile ksy, get new Pcap.java
Use it like that:

Pcap p = Pcap.fromFile("material/log114MB.pcap");
KaitaiStream io = p._io();

// iterate over all the packets until reached end of file
while (!io.isEof()) {
    Pcap.Packet packet = new Pcap.Packet(io, p, p);
    // do something with `packet` here, for example, print out timestamp
    System.out.println(packet.tsSec());
}

AleksandrovichK · 2017-09-13T18:15:24Z

Thank you very much for you answer!

This is a very good solution. However, the purpose of my work is to get the contents of the packages. To be precise, I need to handle each one in turn.

Could you advise another solution without losing the data?

GreyCat · 2017-09-13T18:16:56Z

Um, this solution exactly allows you to access each packet, one-by-one. What exactly are you losing here?

AleksandrovichK · 2017-09-13T19:36:25Z

Oh, I misunderstood you, I'm sorry. Your solution is great and it works.

The only thing I wanted to notice, you meant this method probably:
while(!io.isEof())
Instead of:
> while(!io.eof())

But this is of very small importance.
Infinitely grateful to you.

GreyCat · 2017-09-13T20:24:32Z

Yeah, my bad, I forgot that it's isEof in Java :) Glad that it helped :)

cgi · 2017-12-12T15:39:02Z

Great Example!
Suggest to add it to page for Java work

GreyCat · 2017-12-12T15:47:50Z

The overall algorithm is the same for all languages, so may be it warrants a FAQ entry actually...

webbnh · 2017-12-12T16:16:29Z

This was the solution that I landed on in my case (another file with a header followed by an unbounded sequence of records). However, when I removed the equivalent of packets from my file description, I ran into problems with KS's type deduction. So, I use the following instead:

seq:
  - id: file_header
    type: file_header_v1
  - id: file_header_v3
    if: v3tag == "LOG_V3"
    type: file_header_v3
  - id: records
    type: record
    if: false
    doc: Having this reference satisfies type deduction requirements; the conditional allows us to defer reading it.

That structure also makes it clearer what's actually going on...so, if you create a FAQ/example, you might want to consider providing that idiom.

GreyCat mentioned this issue Sep 13, 2017

Throws an error when I try to _read a large file #1196

Closed

GreyCat added the question label Sep 13, 2017

GreyCat closed this as completed Sep 13, 2017

generalmimon mentioned this issue Apr 23, 2021

Memory consumption problem in large file parsing. #866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throws an error when I try to _read a large file #255

Throws an error when I try to _read a large file #255

GreyCat commented Sep 13, 2017

GreyCat commented Sep 13, 2017 •

edited

Loading

AleksandrovichK commented Sep 13, 2017 •

edited

Loading

GreyCat commented Sep 13, 2017

AleksandrovichK commented Sep 13, 2017 •

edited

Loading

GreyCat commented Sep 13, 2017

cgi commented Dec 12, 2017

GreyCat commented Dec 12, 2017

webbnh commented Dec 12, 2017 •

edited by generalmimon

Loading

Throws an error when I try to _read a large file #255

Throws an error when I try to _read a large file #255

Comments

GreyCat commented Sep 13, 2017

GreyCat commented Sep 13, 2017 • edited Loading

AleksandrovichK commented Sep 13, 2017 • edited Loading

GreyCat commented Sep 13, 2017

AleksandrovichK commented Sep 13, 2017 • edited Loading

GreyCat commented Sep 13, 2017

cgi commented Dec 12, 2017

GreyCat commented Dec 12, 2017

webbnh commented Dec 12, 2017 • edited by generalmimon Loading

GreyCat commented Sep 13, 2017 •

edited

Loading

AleksandrovichK commented Sep 13, 2017 •

edited

Loading

AleksandrovichK commented Sep 13, 2017 •

edited

Loading

webbnh commented Dec 12, 2017 •

edited by generalmimon

Loading