-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance degradation with GoJa JS tracing debug_traceBlockByNumber #25125
Comments
@s1na Could you please take a peek? Thx |
Hey @Gilad-Eisenberger. Thanks for reporting this. I want to help get to the bottom of this. Unfortunately I don't have access to a machine with 56 cores. I just tried on one with 4 cores and both the duktape tracer and goja tracers consume all 4 cores to do I'd like to know if there are inefficiencies in the Goja tracers that are highlighted by your custom tracer. Is it something you can share? Have you tried other JS tracers' resource consumption (e.g. |
Much appreciated. I was able to take some time to run some benchmarking on different configurations, trying out versions 1.10.17 & 1.10.19, with different core configurations (8 & 32, it's the same machine just changing affinity settings to limit the process to specific cores). For each configuration, I ran increasing parallel requests to trace the same block, at the same time, from 1 to 8 concurrent requests.
It looks like with callTracer everything is good in both versions. 1.10.19 provides a slight increase in performance. With the prestate tracer, 1.10.19 is much slower, and scales much worse as the concurrency increases. This is also the only configuration I've seen fail. Also, while all configurations seem to use multiple cores, 1.10.19 on prestateTracer seems to max out all cores much sooner. I've tried modifying the tracer in an attempt to improve performance, but have not made any significant progress. Is this the type of performance difference you're expecting with this tracer on Goja? Additionally, it looks like there's some discrepancy between the total CPU used and performance. This seems to apply to all configurations - but having & using more cores doesn't result in a faster result? Not sure how to investigate this part. I would naively expect the tracing of a block to be single threaded in nature, but it seems to utilize many cores? Any thoughts would be apprecaited |
Interjecting just to respond to this observation. When running JavaScript tracers, a trace run will usually take a lot more time than just executing the same transaction from Go. So if you're tracing a block (or a chain segment), what we do is that we run ahead with Go's tx execution, creating starting points for the .js tracers; and then on many threads pick up these starting points and run the slow tracers on top. Sina made some optimizations that - AFAIK - allows us to only look at certain events (of maybe even only certain opcodes for tracers), which might make them closed to Go's speed. But as for why the prestate tracer might be slower, that's odd as it would still need to run mostly the same code, just perhaps stash away the prestate data somewhere. Perhaps Goja's internal data storage/maps are not performing well and thus the many prestate inserts are causing a slowdown? |
That's helpful information. That explains the multi-core use I was not expecting. Removing the Our main concern currently is Gray Glacier that's upcoming and will force us to upgrade all nodes. Thanks for all the help so far. |
Pointing to Goja exposes no explicit API to do this so we've had to do with hacky approaches. I modified our conversion functions slightly and it seems to now have performance on-par with duktape on a limited dataset. See #25156. I have to test it more both for correctness and speed, but would appreciate it if you could try it with your own tracer and see how it fares. |
Thanks for the effort. I was able to verify the behavior on a cloud-drive based node, and it looks good, definitely better than before. Thanks again for the assistance and giving this focus. |
We're experiencing a significant performance degradation with the latest version v1.10.19 .
Our flows using
debug_traceBlockByNumber
using custom JS tracers are putting significantly higher loads on the nodes when executing a single trace. Each trace now uses all cores on the machine while producing the trace.Execution times remain as they were in previous versions. This means traces now consume significantly more resources to produce, and parallelize much worse than before.
For example, running a single
debug_traceBlockByNumber
using the legacyprestateTracer
(.js version), the trace takes around 8000-9000ms to complete. During the time, it consumes all 56 cores on the node's machine (>80% cpu used).The built-in tracer on the same block completes in 3000ms (expected to be faster), however does not consume more than 1 core during processing.
Earlier versions complete the .js trace in the same amount of time, however do not consume more than one core during processing.
This is a regression from previous versions, which timing-wise appears related to the GoJa tracer changes, however we cannot confirm the root cause.
Any help in investigating the root cause would be greatly appreciated. Additionally, any way to limit the concurrency of a single request would be great as well.
The text was updated successfully, but these errors were encountered: