-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading large files from S3 into memory #2028
Comments
Dominion - no bottlenecks |
Passing off to @eventualbuddha to investiage the ess issue further |
I investigated resource usage for processing ES&S CVR & ballot files. I used @carolinemodic's generated files that had about 500K CVRs:
It is indeed slow and memory intensive. Using a manually-created Jurisdiction and set of uploads, I ran the following script to measure the time and memory usage: #!/usr/bin/env python
import os
import time
os.environ["FLASK_ENV"] = "development"
from server.api.cvrs import parse_ess_cvrs
from server.models import Jurisdiction
jurisdiction_id = "9eae837e-37b2-4ef7-ac4a-5977c403608e"
jurisdiction = Jurisdiction.query.get(jurisdiction_id)
start_time = time.time()
(metadata, cvrs) = parse_ess_cvrs(jurisdiction, "/tmp/arlo-perf-test")
count = 0
try:
for cvr in cvrs:
count += 1
finally:
end_time = time.time()
print("Parsed %d CVRs in %s seconds" % (count, end_time - start_time))
# print rss memory usage
import resource
print("Memory usage: %s (kb)" % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss) I also watched the process with This isn't great, but should be within what a worker dyno can deal with as we've configured them. In practice, the largest set of ES&S CVRs we're likely to see is about 180K records, or about 1/3 of this "big" test fixture. With a RAM floor of about 130MB for our python process, that means a RAM utilization of somewhere around 1.3GB. I also did a spike of using Rust to process these files. It's about 20x faster (1.2s) and 13x less RAM (250MB), and I wasn't even trying to make it efficient. There are huge gains to be had by moving away from python for anything CPU bound. |
Unassigned myself so @arsalansufi can re-prioritize. |
https://votingworks.slack.com/archives/CKCVA0F9S/p1730309143498049
The text was updated successfully, but these errors were encountered: