[Logs] Initial draft to handle batch export into background thread #2096

ThomsonTan · 2024-09-10T14:30:23Z

Fixes #2066

Changes

This is an early draft which implements the proposal mentioned in #2066, and looking for feedback.

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

cijothomas · 2024-09-10T16:55:38Z

See https://github.com/open-telemetry/opentelemetry-rust/compare/main...lalitb:thread-runtime?expand=1 also to get some ideas about solving same/similar problems.

…ound_thread

…thread

lalitb · 2024-10-14T20:48:41Z

opentelemetry-sdk/src/logs/log_processor.rs

+    //     Either::Left((export_res, _)) => export_res,
+    //     Either::Right((_, _)) => ExportResult::Err(LogError::ExportTimedOut(time_out)),
+    // }
+    ExportResult::Ok(())


So, we are expecting export method to complete in a definite time-period, and we are not handling timeout outside of export method ? any reason why it is so ?

yes. Just chatted with @ThomsonTan about this. If a rogue exporter takes forever, then BatchProcessor's will get stuck behind that. For now, lets us document this limitation and move on.
(Metrics have the same problem. Its slightly worse in metrics, as observable callbacks can also go rogue and entire Metrics SDK will halt to function)

lalitb · 2024-12-10T20:17:05Z

@ThomsonTan - can we move this as separate processor (say) BatchLogProcessorWithOwnThread and keeping existing BatchLogProcessor, and then once we validate all the scenarios, we can rename it back - as done two-step process for metrics in #2292 and #2403. That way, we can merge it more quickly and keep doing incremental changes on top before being stable. The next release can bring it as separate processor, and then subsequent stable release for logs will have it as default?

lalitb · 2024-12-10T20:18:39Z

Also, can you please fix the merge conflicts - would be good to review on that :),

lalitb · 2024-12-10T20:35:30Z

Also @ThomsonTan - Can you test it with all OTLP scenarios - OTLP HTTP (hyper, reqwest, reqwest-blocing) and OTLP gRPC, and share the result. Thanks.

lalitb · 2024-12-11T00:59:38Z

Also, adding this to v0.28 milestone.

cijothomas · 2024-12-12T01:56:35Z

opentelemetry-sdk/src/logs/log_processor.rs

@@ -538,11 +585,13 @@ where
 enum BatchMessage {
    /// Export logs, usually called when the log is emitted.
    ExportLog((LogRecord, InstrumentationScope)),
+    /// ForceFlush flush the current buffer to the backend
+    ForceFlush(mpsc::SyncSender<ExportResult>),


there is another Flush in the enum below, is that meant to be removed?

opentelemetry-sdk/src/logs/log_processor.rs

cijothomas · 2024-12-12T01:59:50Z

opentelemetry-sdk/src/logs/log_processor.rs

            record.clone(),
            instrumentation.clone(),
        )));

+        if let Err(err) = result {


please remove this logging as we are tracking drop via counts

cijothomas · 2024-12-12T02:01:25Z

opentelemetry-sdk/src/logs/log_processor.rs

-        futures_executor::block_on(res_receiver)
-            .map_err(|err| LogError::Other(err.into()))
-            .and_then(std::convert::identity)
+        let (sender, receiver) = mpsc::sync_channel(1);


there is a one shot::channel, and another try_send just above this. I guess they were leftovers and should be removed.

cijothomas · 2024-12-12T02:02:52Z

opentelemetry-sdk/src/logs/log_processor.rs

+        })??;
+
+        if let Some(handle) = self.handle.lock().unwrap().take() {
+            handle.join().unwrap();


emit internal log when thread is exiting?

cijothomas · 2024-12-12T02:05:37Z

opentelemetry-sdk/src/logs/log_processor.rs

-        }));
+        });
+
+        let forceflush_timeout = env::var(OTEL_LOGS_FORCEFLUSH_TIMEOUT_NAME)


I don't think we need timeout for flush.

cijothomas · 2024-12-12T02:06:34Z

opentelemetry-sdk/src/logs/log_processor.rs

+            .ok()
+            .and_then(|v| v.parse().map(Duration::from_millis).ok())
+            .unwrap_or(OTEL_LOGS_DEFAULT_FORCEFLUSH_TIMEOUT);
+        let shutdown_timeout = env::var(OTEL_LOGS_SHUTDOWN_TIMEOUT_NAME)


agree with using timeout for shutdown, but this env variable is not required. Lets accept a timeout in the shutdown() method itself, and if none provided, use 5 sec as default. (5 sec is not from any spec, just my guess of a good default)

cijothomas · 2024-12-12T02:07:28Z

opentelemetry-sdk/src/logs/log_processor.rs

+        self.sender.try_send(BatchMessage::Shutdown(sender))
+                    .map_err(|err| LogError::Other(err.into()))?;
+
+        receiver.recv_timeout(self.shutdown_timeout).map_err(|err| {


nice way of enforcing shutdown timeout! Love it

cijothomas · 2024-12-12T02:08:17Z

opentelemetry-sdk/src/logs/log_processor.rs

-                                    error = format!("{}", err)
-                                );
-                            }
+            logs.reserve(config.max_export_batch_size);


I think Vec: has a new method that takes capacity, so you can use that instead of reserve.

cijothomas · 2024-12-12T02:10:06Z

opentelemetry-sdk/src/logs/log_processor.rs

+                };
+
+                match receiver.recv_timeout(remaining_time) {
+                    Ok(BatchMessage::ExportLog(data)) => {


trying to understand the amount of allocation/copying overall...
when emit() is called, we close the record and store it in heap, and a pointer to the heap is passed to the channel. In our thread, we receive the pointer as a message and add that to the vec, so its just the cost of copying a pointer, and not the entire LogRecord again.. is this right understanding?

cijothomas · 2024-12-12T02:11:21Z

opentelemetry-sdk/src/logs/log_processor.rs

+
+                match receiver.recv_timeout(remaining_time) {
+                    Ok(BatchMessage::ExportLog(data)) => {
+                        logs.push(data);


I think I mentioned this in the GitHub issue too, but something we need to further discuss. The channel has capacity of 500 (example). The vec also can hold 500. So it is possible for around 1000 item to be in memory (channel + vector) at a time?

[Logs] Initial draft to handle batch export into background thread

b8c67cb

ThomsonTan changed the title ~~[DRAFT][Logs] Initial draft to handle batch export into background thread~~ [WIP][Logs] Initial draft to handle batch export into background thread Sep 10, 2024

ThomsonTan added 8 commits September 10, 2024 11:12

Merge branch 'main' into switch_batch_log_to_background_thread

afe93fe

Merge branch 'main' into switch_batch_log_to_background_thread

25579f0

Merge remote-tracking branch 'origin' into switch_batch_log_to_backgr…

01ef3f4

…ound_thread

Commit last good log processor

a45310f

Add shutdown implementation

4453aa7

Loop on channel in background thread

475ab15

Batch log data

daa694d

Implement forceflush and shutdown on batch processor with background …

8e3bce3

…thread

ThomsonTan changed the title ~~[WIP][Logs] Initial draft to handle batch export into background thread~~ [Logs] Initial draft to handle batch export into background thread Oct 11, 2024

ThomsonTan added 3 commits October 14, 2024 11:37

Merge branch 'main' into switch_batch_log_to_background_thread

31a6bae

Remove runtime parameter on install_batch

cdd2458

Remove more runtime parameters

d5042e5

lalitb reviewed Oct 14, 2024

View reviewed changes

Check shutdown flag

082972d

lalitb added this to the 0.28.0 milestone Dec 11, 2024

Merge main to switch_batch_log_to_background_thread

bfd2873

cijothomas reviewed Dec 12, 2024

View reviewed changes

opentelemetry-sdk/src/logs/log_processor.rs Show resolved Hide resolved

Merge branch 'main' into switch_batch_log_to_background_thread

0b29e76

cijothomas reviewed Dec 12, 2024

View reviewed changes

ThomsonTan mentioned this pull request Dec 16, 2024

Handle batch log processing in a dedicated background thread #2436

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs] Initial draft to handle batch export into background thread #2096

[Logs] Initial draft to handle batch export into background thread #2096

ThomsonTan commented Sep 10, 2024

cijothomas commented Sep 10, 2024

lalitb Oct 14, 2024

cijothomas Dec 11, 2024

lalitb commented Dec 10, 2024

lalitb commented Dec 10, 2024

lalitb commented Dec 10, 2024

lalitb commented Dec 11, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

cijothomas Dec 12, 2024

[Logs] Initial draft to handle batch export into background thread #2096

Are you sure you want to change the base?

[Logs] Initial draft to handle batch export into background thread #2096

Conversation

ThomsonTan commented Sep 10, 2024

Changes

Merge requirement checklist

cijothomas commented Sep 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb commented Dec 10, 2024

lalitb commented Dec 10, 2024

lalitb commented Dec 10, 2024

lalitb commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment