Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NC | NSFS | Add Log Printing Measuring Time Duration (Buffer Allocation) #8603

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shirady
Copy link
Contributor

@shirady shirady commented Dec 17, 2024

Explain the changes

  1. Add logging printing (higher than the default level) measuring tome duration inside get_buffer and inside dio_buffer_alloc (In fs_napi.cpp the code lines were copied from other commands in this file).

Issues:

  1. Related to issue Error: Warning stuck buffer_pool buffer with IO workload  #8521 when the buffer gets timeout (after 2 minutes by current configuration) it prints an error in logs, although the operation is not failing, we want to attach time measurements to better understand the duration of the functions (allocation as an example).

Testing Instructions:

First, since we change a native file, you need to run: npm run build (npm run build:native)
I followed the instructions I did in the issue (see comment).

  • Doc added/updated
  • Tests added

@shirady shirady force-pushed the nsfs-nc-add-time-printings-in-logs branch from ec36375 to 8b1a50d Compare December 22, 2024 12:13
@shirady
Copy link
Contributor Author

shirady commented Dec 22, 2024

I still don't fully understand how the 30 milliseconds are split, I saw:

Dec-22 14:07:58.016 [nsfs/7058]    [L1] core.util.buffer_utils:: BufferPool.get_buffer: sem value 16777216 waiting_value 0 buffers length 0
Dec-22 14:07:58.016 [nsfs/7058]   [LOG] CONSOLE:: SDSD in buffer allocation
2024-12-22 14:07:58.017448 [PID-7058/TID-259] [L2] dio_buffer_alloc took: 0.828 ms
Dec-22 14:07:58.017 [nsfs/7058]    [L2] core.util.buffer_utils:: get_buffer: Time taken to execute buffer_alloc function with buf_size 8388608 is 1.204083000018727 ms.
Dec-22 14:07:58.017 [nsfs/7058]    [L2] core.sdk.namespace_fs:: get_buffer: Time taken to execute callback is 0.00033300003269687295 ms.

Another case:

Dec-22 14:07:58.023 [nsfs/7058]   [LOG] CONSOLE:: SDSD in buffer allocation
2024-12-22 14:07:58.023701 [PID-7058/TID-259] [L2] dio_buffer_alloc took: 0.034458 ms
Dec-22 14:07:58.023 [nsfs/7058]    [L2] core.util.buffer_utils:: get_buffer: Time taken to execute buffer_alloc function with buf_size 8388608 is 0.0817500000121072 ms.
Dec-22 14:07:58.023 [nsfs/7058]    [L2] core.sdk.namespace_fs:: get_buffer: Time taken to execute callback is 0.0003339999821037054 ms

Full logs are attached here
noobaa_logs.log

@shirady shirady self-assigned this Dec 22, 2024
@shirady shirady marked this pull request as draft December 22, 2024 12:21
@shirady shirady force-pushed the nsfs-nc-add-time-printings-in-logs branch from 8b1a50d to 300f14c Compare December 22, 2024 14:07
Signed-off-by: shirady <57721533+shirady@users.noreply.github.com>
@shirady shirady force-pushed the nsfs-nc-add-time-printings-in-logs branch from 300f14c to 5cd2c36 Compare December 23, 2024 17:30
Signed-off-by: shirady <57721533+shirady@users.noreply.github.com>
@shirady shirady force-pushed the nsfs-nc-add-time-printings-in-logs branch from 5cd2c36 to 150c709 Compare December 23, 2024 17:31
@shirady
Copy link
Contributor Author

shirady commented Jan 5, 2025

I attached another run (the cases where we get the warning are inconsistent).
Here we can see the new printings:

Jan-5 14:50:06.694 [nsfs/10266]    [L1] core.util.buffer_utils:: BufferPool.get_buffer: sem value 8388608 waiting_value 0 buffers length 0
Jan-5 14:50:06.694 [nsfs/10266]   [LOG] CONSOLE:: SDSD in buffer allocation
Jan-5 14:50:06.695 [nsfs/10266]    [L2] core.util.buffer_utils:: get_buffer: Time taken to execute buffer_alloc function with buf_size 8388608 is 0.06683299999986048 ms.
2025-01-05 14:50:06.695242 [PID-10266/TID-259] [L1] FS::FSWorker::Begin: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file _wrap->_fd=22 _pos=8388608 _offset=0 _len=8388608
2025-01-05 14:50:06.695283 [PID-10266/TID-10499] [L1] FS::FSWorker::Execute: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file _wrap->_fd=22 _pos=8388608 _offset=0 _len=8388608 _uid=1001 _gid=1001 _backend=
2025-01-05 14:50:06.695302 [PID-10266/TID-10499] [L1] FS::FSWorker::Execute: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file _wrap->_fd=22 _pos=8388608 _offset=0 _len=8388608 _uid=1001 _gid=1001 geteuid()=1001 getegid()=1001 getuid()=1001 getgid()=1001
2025-01-05 14:50:06.700785 [PID-10266/TID-10499] [L1] FS::FSWorker::Execute: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file _wrap->_fd=22 _pos=8388608 _offset=0 _len=8388608  took: 5.4665 ms
2025-01-05 14:50:06.707524 [PID-10266/TID-259] [L1] FS::FSWorker::OnOK: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file
Jan-5 14:50:06.708 [nsfs/10266]    [L2] core.sdk.namespace_fs:: iniside condition: drain_promise
Jan-5 14:50:06.718 [nsfs/10266]  [WARN] CONSOLE:: Error: Warning stuck buffer_pool buffer
    at BuffersPool.get_buffer (/Users/shiradymnik/SourceCode/noobaa-core/src/util/buffer_utils.js:222:25)
    at async NamespaceFS.read_object_stream (/Users/shiradymnik/SourceCode/noobaa-core/src/sdk/namespace_fs.js:1081:46)
    at async NsfsObjectSDK._call_op_and_update_stats (/Users/shiradymnik/SourceCode/noobaa-core/src/sdk/object_sdk.js:582:27)
    at async Object.get_object [as handler] (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/ops/s3_get_object.js:116:25)
    at async handle_request (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/s3_rest.js:173:19)
    at async Object.s3_rest [as handler] (/Users/shiradymnik/SourceCode/noobaa-core/src/endpoint/s3/s3_rest.js:67:9)
Jan-5 14:50:06.721 [nsfs/10266]    [L2] core.sdk.namespace_fs:: drain_promise: Time taken to execute is 13.65625 ms.

Full logs are attached here
noobaa_logs.log

30 ms timeout (as mentioned on a local run, I decreased it from 2 minutes to 30 ms):
We can explain about 22 ms out of 30 ms in this example:

  • Time taken to execute buffer_alloc function with buf_size 8388608 is 0.06683299999986048 ms. -> ~1 ms
  • FS::FSWorker::Execute: FileRead _wrap->_path=/Users/buckets/bucket-buf/256MB_file _wrap took: 5.4665 ms -> ~6 ms
  • drain_promise: Time taken to execute is 13.65625 ms -> ~14
  • write to socket: Time taken to execute is 0.039584000000104425 ms. -> ~1

In the real system, the write to socket is based on the network and might probably explain more of the time that took this to run.

The main question in this PR is whether adding this printing would help us in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant