Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throws an error when I try to _read a large file #255

Closed
GreyCat opened this issue Sep 13, 2017 · 8 comments
Closed

Throws an error when I try to _read a large file #255

GreyCat opened this issue Sep 13, 2017 · 8 comments
Labels

Comments

@GreyCat
Copy link
Member

GreyCat commented Sep 13, 2017

Moved from #1196, original by @bulbum.

Good evening!
I use Kaitai to handle pcap files. And when I try to load 114 MB log:
Pcap p = Pcap.fromFile("material\\log114MB.pcap");
it throws

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
	at com.company.EthernetFrame._read(EthernetFrame.java:65)
	at com.company.EthernetFrame.<init>(EthernetFrame.java:41)
	at com.company.Pcap$Packet._read(Pcap.java:283)
	at com.company.Pcap$Packet.<init>(Pcap.java:266)
	at com.company.Pcap._read(Pcap.java:164)
	at com.company.Pcap.<init>(Pcap.java:144)
	at com.company.Pcap.fromFile(Pcap.java:21)
	at com.company.View.toParseFromFile(View.java:180)
	at com.company.View.access$100(View.java:27)
	at com.company.View$1.mouseReleased(View.java:88)

However, when I load a 8 KB file, everything is fine.

I tried to increase the max heap size using command line option to JVM. But increasing to some finite max value does not ultimately get rid of the issue.

@GreyCat
Copy link
Member Author

GreyCat commented Sep 13, 2017

It's somewhat complicated issue, which should be started to be dealt with in kaitai-io/kaitai_struct_compiler#133, but it's still a good way to go.

For a while, I could recommend the following workaround. pcap format literally consists of the header and a long list of packets. If you don't need them all at once in the memory, then it can be done like that:

  • Modify seq from top-level type in pcap.ksy to be like that (i.e. remove packets):
seq:
  - id: hdr
    type: header
  • Recompile ksy, get new Pcap.java
  • Use it like that:
Pcap p = Pcap.fromFile("material/log114MB.pcap");
KaitaiStream io = p._io();

// iterate over all the packets until reached end of file
while (!io.isEof()) {
    Pcap.Packet packet = new Pcap.Packet(io, p, p);
    // do something with `packet` here, for example, print out timestamp
    System.out.println(packet.tsSec());
}

@AleksandrovichK
Copy link

AleksandrovichK commented Sep 13, 2017

Thank you very much for you answer!

This is a very good solution. However, the purpose of my work is to get the contents of the packages. To be precise, I need to handle each one in turn.

Could you advise another solution without losing the data?

@GreyCat
Copy link
Member Author

GreyCat commented Sep 13, 2017

Um, this solution exactly allows you to access each packet, one-by-one. What exactly are you losing here?

@AleksandrovichK
Copy link

AleksandrovichK commented Sep 13, 2017

Oh, I misunderstood you, I'm sorry. Your solution is great and it works.

The only thing I wanted to notice, you meant this method probably:
while(!io.isEof())
Instead of:
> while(!io.eof())

But this is of very small importance.
Infinitely grateful to you.

@GreyCat
Copy link
Member Author

GreyCat commented Sep 13, 2017

Yeah, my bad, I forgot that it's isEof in Java :) Glad that it helped :)

@GreyCat GreyCat closed this as completed Sep 13, 2017
@cgi
Copy link

cgi commented Dec 12, 2017

Great Example!
Suggest to add it to page for Java work

@GreyCat
Copy link
Member Author

GreyCat commented Dec 12, 2017

The overall algorithm is the same for all languages, so may be it warrants a FAQ entry actually...

@webbnh
Copy link

webbnh commented Dec 12, 2017

This was the solution that I landed on in my case (another file with a header followed by an unbounded sequence of records). However, when I removed the equivalent of packets from my file description, I ran into problems with KS's type deduction. So, I use the following instead:

seq:
  - id: file_header
    type: file_header_v1
  - id: file_header_v3
    if: v3tag == "LOG_V3"
    type: file_header_v3
  - id: records
    type: record
    if: false
    doc: Having this reference satisfies type deduction requirements; the conditional allows us to defer reading it.

That structure also makes it clearer what's actually going on...so, if you create a FAQ/example, you might want to consider providing that idiom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants