You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue documents the performance challenges observed during the Montage experiments. The high server-side overhead encountered when processing small I/O requests needs to be mitigated to make PDC beneficial for Montage or AI applications.
What does this feature solve or improve?
The Montage results suggest that the server may not be operating at its peak efficiency. In scenarios with numerous concurrent requests, the server could potentially become a bottleneck. We may observe similar I/O patterns from AI applications as well.
Describe the solution you'd like
Server side algorithm for processing I/O requests can be improved.
Server-side multi-threading should also be able to improve the efficiency.
Montage results on Perlmutter
The Montage components execute a large number of small reads and writes. Within the tested workflow, each I/O operation amounts to approximately 3000 bytes.
The performance of PDC, with or without cache, remains similar, indicating that the majority of the time was consumed by server processing
I did some further investigations on one component, mProjExecMPI. This component executes N small writes, followed by one read, and then another M writes. I implemented optimizations, including utilizing session consistency and combining all writes into one batched call. However, the performance remains suboptimal. Especially, the single read operation takes 2 seconds, suggesting it was awaiting processing on the server side.
The text was updated successfully, but these errors were encountered:
This issue documents the performance challenges observed during the Montage experiments. The high server-side overhead encountered when processing small I/O requests needs to be mitigated to make PDC beneficial for Montage or AI applications.
What does this feature solve or improve?
The Montage results suggest that the server may not be operating at its peak efficiency. In scenarios with numerous concurrent requests, the server could potentially become a bottleneck. We may observe similar I/O patterns from AI applications as well.
Describe the solution you'd like
Server side algorithm for processing I/O requests can be improved.
Server-side multi-threading should also be able to improve the efficiency.
Montage results on Perlmutter
The Montage components execute a large number of small reads and writes. Within the tested workflow, each I/O operation amounts to approximately 3000 bytes.
The performance of PDC, with or without cache, remains similar, indicating that the majority of the time was consumed by server processing
I did some further investigations on one component, mProjExecMPI. This component executes N small writes, followed by one read, and then another M writes. I implemented optimizations, including utilizing session consistency and combining all writes into one batched call. However, the performance remains suboptimal. Especially, the single read operation takes 2 seconds, suggesting it was awaiting processing on the server side.
The text was updated successfully, but these errors were encountered: