Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1844569 - RLB: Gracefully handle uploader thread start failure #2545

Merged
merged 2 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,10 @@ commands:
name: Run Rust RLB test
command: |
glean-core/rlb/tests/test-shutdown-blocking.sh
- run:
name: Run Rust RLB crash test
command: |
glean-core/rlb/tests/test-thread-crashing.sh
- run:
name: Upload coverage report
command: |
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

* Rust
* The Ping Rate Limit type is now accessible in the Rust Language Binding ([#2528](https://github.com/mozilla/glean/pull/2528))
* Gracefully handle a failure when starting the upload thread. Glean no longer crashes in that case. ([#2545](https://github.com/mozilla/glean/pull/2545))
* `locale` now exposed through the RLB so it can be set by consumers ([2527](https://github.com/mozilla/glean/pull/2527))
* Python
* Added the shutdown API for Python to ensure orderly shutdown and waiting for uploader processes ([#2538](https://github.com/mozilla/glean/pull/2538))
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions glean-core/rlb/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ env_logger = { version = "0.10.0", default-features = false, features = ["auto-c
tempfile = "3.1.0"
jsonschema-valid = "0.5.0"
flate2 = "1.0.19"
libc = "0.2"

[features]
preinit_million_queue = ["glean-core/preinit_million_queue"]
123 changes: 123 additions & 0 deletions glean-core/rlb/examples/crashing-threads.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

//! **THIS IS HIGHLY UNSAFE**
//!
//! We deliberately override pthread symbols here to control how threads are launched,
//! in order to simulate failures in this thread launching.
//! It works on Linux and macOS only.
//! It's only a test.
//!
//! Launching a thread can fail for various reasons, e.g. not enough memory resources available.
//! We saw that issue on low-powered machines running 32-bit versions of Windows.
//!
//! You want to run this example with `RUSTFLAGS="-C panic=abort"` to make sure it aborts.

use std::env;
use std::path::PathBuf;

use once_cell::sync::Lazy;
use tempfile::Builder;

use glean::{private::PingType, ClientInfoMetrics, ConfigurationBuilder};

#[cfg(unix)]
mod unix {
use std::os::raw::{c_int, c_void};
use std::sync::atomic::{AtomicU32, Ordering};

/// Tracking how many threads have already been spawned.
static ALLOW_THREAD_SPAWNED: AtomicU32 = AtomicU32::new(0);

#[allow(non_camel_case_types)]
type pthread_ft = extern "C" fn(
native: *mut libc::pthread_t,
attr: *const libc::pthread_attr_t,
f: extern "C" fn(*mut c_void) -> *mut c_void,
value: *mut c_void,
) -> c_int;

#[no_mangle]
pub unsafe extern "C" fn pthread_create(
native: *mut libc::pthread_t,
attr: *const libc::pthread_attr_t,
f: extern "C" fn(*mut c_void) -> *mut c_void,
value: *mut c_void,
) -> c_int {
let name = b"pthread_create\0".as_ptr() as *const i8;
let symbol = libc::dlsym(libc::RTLD_NEXT, name);
if symbol.is_null() {
panic!("dlsym failed to load `pthread_create` name. Nothing we can do, we abort.");
}

let real_pthread_create = *(&symbol as *const *mut _ as *const pthread_ft);

// thread 1 = glean.initialize
// thread 2 = ping directory processor
// thread 3 = MPS
// thread 4 = uploader for first metrics ping <- this is the one we want to fail
// thread 5 = uploader for prototype ping <- this is the one we want to fail
// thread 6 = shutdown wait thread
let spawned = ALLOW_THREAD_SPAWNED.fetch_add(1, Ordering::SeqCst);
if spawned == 4 || spawned == 5 {
return -1;
}

real_pthread_create(native, attr, f, value)
}
}

pub mod glean_metrics {
use glean::{private::BooleanMetric, CommonMetricData, Lifetime};

#[allow(non_upper_case_globals)]
pub static sample_boolean: once_cell::sync::Lazy<BooleanMetric> =
once_cell::sync::Lazy::new(|| {
BooleanMetric::new(CommonMetricData {
name: "sample_boolean".into(),
category: "test.metrics".into(),
send_in_pings: vec!["prototype".into()],
disabled: false,
lifetime: Lifetime::Ping,
..Default::default()
})
});
}

#[allow(non_upper_case_globals)]
pub static PrototypePing: Lazy<PingType> =
Lazy::new(|| PingType::new("prototype", true, true, vec![]));

fn main() {
env_logger::init();

let mut args = env::args().skip(1);

let data_path = if let Some(path) = args.next() {
PathBuf::from(path)
} else {
let root = Builder::new().prefix("simple-db").tempdir().unwrap();
root.path().to_path_buf()
};

let cfg = ConfigurationBuilder::new(true, data_path, "org.mozilla.glean_core.example")
.with_server_endpoint("invalid-test-host")
.with_use_core_mps(true)
.build();

let client_info = ClientInfoMetrics {
app_build: env!("CARGO_PKG_VERSION").to_string(),
app_display_version: env!("CARGO_PKG_VERSION").to_string(),
channel: None,
locale: None,
};

glean::initialize(cfg, client_info);

glean_metrics::sample_boolean.set(true);

PrototypePing.submit(None);

glean::shutdown();
}
23 changes: 20 additions & 3 deletions glean-core/rlb/src/net/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,26 @@ impl UploadManager {
Ordering::SeqCst,
Ordering::SeqCst,
);
})
.expect("Failed to spawn Glean's uploader thread");
*handle = Some(thread);
});

match thread {
Ok(thread) => *handle = Some(thread),
Err(err) => {
log::warn!("Failed to spawn Glean's uploader thread. This will be retried on next upload. Error: {err}");
// Swapping back the thread state. We're the ones setting it to `Running`, so
// should be able to set it back.
let state_change = self.inner.thread_running.compare_exchange(
State::Running,
State::Stopped,
Ordering::SeqCst,
Ordering::SeqCst,
);

if state_change.is_err() {
log::warn!("Failed to swap back thread state. Someone else jumped in and changed the state.");
}
}
};
}

pub(crate) fn shutdown(&self) {
Expand Down
32 changes: 32 additions & 0 deletions glean-core/rlb/tests/test-thread-crashing.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

# Test harness for testing the RLB processes from the outside.
#
# Some behavior can only be observed when properly exiting the process running Glean,
# e.g. when an uploader runs in another thread.
# On exit the threads will be killed, regardless of their state.

# Remove the temporary data path on all exit conditions
cleanup() {
if [ -n "$datapath" ]; then
rm -r "$datapath"
fi
}
trap cleanup INT ABRT TERM EXIT

tmp="${TMPDIR:-/tmp}"
datapath=$(mktemp -d "${tmp}/glean_long_running.XXXX")

RUSTFLAGS="-C panic=abort" \
RUST_LOG=debug \
cargo run --example crashing-threads -- "$datapath"
ret=$?
count=$(ls -1q "$datapath/pending_pings" | wc -l)

if [[ $ret -eq 0 ]] && [[ "$count" -eq 1 ]]; then
echo "test result: ok."
exit 0
else
echo "test result: FAILED."
exit 101
fi