-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Instruct kaitai to read the stream but not to store byte array #867
Comments
I guess it can be done with meta:
id: format
seq:
- type: t_event
repeat: eos
types:
t_event:
seq:
- id: type
type: u1
- id: body_length
type: u2
- id: body
type:
switch-on: type
...
if: is_supported
instances:
is_supported:
value: type != type::unsupported
next:
pos: _io.pos + body_length
type: format
if: !is_supported So for the last item in the seq you would have to check if it BTW, what is your format? |
@KOLANICH we have several formats related to telecommunication industry, these are proprietary formats and files are fairly large, around 100mb and main issue is the velocity of the incoming data that we have to process. Usually certain applications care about certain types of records in these files, so majority of the time it only requires 10% of the data during the process, other 90% of the data is not required, (other data may be relevant in different process). Issue is we can keep the memory footprint very low and process these files efficiently if there's a way to ignore these unused bytes during the process. note that, files are flowing around 10K-50K or more per second, and even more. I think this is very simple implementation to do, if kaitai can support this in future releases, t_event_unsupported_body:
params:
- id: body_length
type: u2
seq:
- id: data
size: body_length
ignore: true # introduce something like this Then simply, in the generated code // instead of the following
// this.data = this._io.readBytes((bodyLength()));
// we simply use this
this._io.readBytes((bodyLength())); Even now we can modify the generated code to do this, but it is not the best option. |
Is it DPI?
I don't think KS suits here currently. I tried to parse 2 GiB file into KS (a Qt Installer Framework installer, just an index pointing to 7z archives within a file (which were not parsed, I only needed their offsets and sizes to be able to generate
Requested for a long time. #88 and #525. Also you may find #65 useful. But all of these are long-stalled, the leading devs have too little time, I don't want to touch Scala. Please consider implementing them yourself. |
Thanks for information, so far kaitai performing well in our scenario, just wanted to check , is there something that I missed out. Worst case we could change the code do post processing on the generated code to ignore the unused byte storage initialization. I will certainly check the issues you mentioned. Thanks a lot for your time taking to answer the questions, really appreciate it and also the effort to make such great framework for parsing binary files |
We have a situation where large number of records (hence the large size too) come inside single file , we only need to process two types of events and we need to ignore other types. We can simply provide unsupported kaitai struct with the
size
but it still store those byte array in the memory. This is huge in our case, is there a way to tell kaitai struct that it needs to advance the stream based on thesize
given but not to store those bytes in instance variableex: let's say way have the following 2 event and one we don't care
The text was updated successfully, but these errors were encountered: