-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage when reading json file #906
Reduce memory usage when reading json file #906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fukusuket Thanks for your Pull Request.
The way you suggested it, it's supposed to be a json file with one record per line.
Codecov ReportBase: 74.81% // Head: 74.91% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #906 +/- ##
==========================================
+ Coverage 74.81% 74.91% +0.09%
==========================================
Files 24 24
Lines 15663 15722 +59
==========================================
+ Hits 11719 11778 +59
Misses 3944 3944
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@hitenkoku @YamatoSecurity Suuported JSON formatI changed to support the following 3 pattern formats.
BenchmarkOTRF/Security-Datasets apt29/day2 json data.
The rate of improvement has decreased... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested csv-timeline and json-timeline against the two APT29 files. (2.1GB)
Processing time and detections are the same.
Memory usage went from 12.7GB to 5.4GB
LGTM!
Thank you so much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
Thank you so much for review and benchmark :) |
I found a place where the memory usage can be reduced by reading the json file, so I changed it a little :)
What Changed
Process line by line instead of reading the whole json at once.
(Sorry if I misunderstood the spec and can't make this change ... )
Evidence
Environment
Test1
OTRF/Security-Datasets apt29/day2 json data.
The number of detections is the same, but the file sizes are different for the following reasons:
In main branch , when
}
was included in the log, an invalid,
was added, so this PR removed invalid,
.Console output
before
This PR
Test2
OTRF/Security-Datasets apt29/day1 json data.
I would appreciate it if you could review🙏