Skip to content

Commit

Permalink
Staging perf (#2995)
Browse files Browse the repository at this point in the history
Batch of performance improvements:

* linux: replace default allocator with jemalloc (#2882) 
* Add a private part to the Context structure (#2802) 
* lighter path manipulation in response formatting (#2854) 
* prevent span attributes from being formatted to write logs (#2890) 
* Parse Accept headers once instead of three times (#2847) 
* use a parking-lot mutex in Context (#2885) 
* Filter spans before sending them to the opentelemetry layer (#2894)
  • Loading branch information
Geal committed May 15, 2023
2 parents 94e787c + 936555a commit 7e02a1f
Show file tree
Hide file tree
Showing 31 changed files with 644 additions and 322 deletions.
5 changes: 5 additions & 0 deletions .changesets/fix_geal_context_mutex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### use a parking-lot mutex in Context ([Issue #2751](https://github.com/apollographql/router/issues/2751))

The context requires synchronized access to the busy timer, and precedently we used a futures aware mutex for that, but those are susceptible to contention. This replaces that mutex with a parking-lot synchronous mutex that is much faster.

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/2885
5 changes: 5 additions & 0 deletions .changesets/fix_geal_filter_spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Filter spans before sending them to the opentelemetry layer

the sampling configuration in the opentelemetry layer only applies when the span closes, so in the meantime a lot of data is created just to be dropped. This adds a filter than can sample spans before the opentelemetry layer. The sampling decision is done at the root span, and then derived from the parent span in the rest of the trace.

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/2894
5 changes: 5 additions & 0 deletions .changesets/fix_geal_null_field_formatter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### prevent span attributes from being formatted to write logs

we do not show span attributes in our logs, but the log formatter still spends some time formatting them to a string, even when there will be no logs written for the trace. This adds the `NullFieldFormatter` that entirely avoids formatting the attributes

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/2890
9 changes: 9 additions & 0 deletions .changesets/maint_garypen_jemalloc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
### use jemalloc on linux

Detailed memory investigations of the router in use have revealed that there is a significant amount of memory fragmentation when using the default allocator, glibc, on linux. Performance testing and flamegraph analysis suggests that jemalloc on linux can yield significant performance improvements. In our tests, this figure shows performance to be about 35% faster than the default allocator. The improvement in performance being due to less time spent managing memory fragmentation.

Not everyone will see a 35% performance improvement in this release of the router. Depending on your usage pattern, you may see more or less than this. If you see a regression, please file an issue with details.

We have no reason to believe that there are allocation problems on other platforms, so this change is confined to linux.

By [@garypen](https://github.com/garypen) in https://github.com/apollographql/router/pull/2882
5 changes: 5 additions & 0 deletions .changesets/maint_geal_path_manipulation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### lighter path manipulation in response formatting

Response formatting generates a lot of temporary allocations to create response paths that end up unused. By making a reference based type to hold these paths, we can prevent those allocations and improve performance.

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/2854
7 changes: 7 additions & 0 deletions .changesets/maint_geal_private_context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Add a private part to the Context structure ([Issue #2800](https://github.com/apollographql/router/issues/2800))

There's a cost in using the `Context` structure throughout a request's lifecycle, due to the JSON serialization and deserialization, so it should be reserved from inter plugin communication between rhai, coprocessor and Rust. But for internal router usage, we can have a more efficient structure that avoids serialization costs, and does not expose data that should not be modified by plugins.

That structure is based on a map indexed by type id, which means that if some part of the code can see that type, then it can access it in the map.

By [@Geal](https://github.com/Geal) in https://github.com/apollographql/router/pull/2802
22 changes: 22 additions & 0 deletions Cargo.lock
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ dependencies = [
"opentelemetry-semantic-conventions",
"opentelemetry-zipkin",
"p256 0.12.0",
"parking_lot 0.12.1",
"paste",
"pin-project-lite",
"prometheus",
Expand Down Expand Up @@ -378,6 +379,7 @@ dependencies = [
"test-log",
"test-span",
"thiserror",
"tikv-jemallocator",
"tokio",
"tokio-rustls",
"tokio-stream",
Expand Down Expand Up @@ -5883,6 +5885,26 @@ dependencies = [
"tower",
]

[[package]]
name = "tikv-jemalloc-sys"
version = "0.5.3+5.3.0-patched"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a678df20055b43e57ef8cddde41cdfda9a3c1a060b67f4c5836dfb1d78543ba8"
dependencies = [
"cc",
"libc",
]

[[package]]
name = "tikv-jemallocator"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20612db8a13a6c06d57ec83953694185a367e16945f66565e8028d2c0bd76979"
dependencies = [
"libc",
"tikv-jemalloc-sys",
]

[[package]]
name = "time"
version = "0.3.20"
Expand Down
5 changes: 5 additions & 0 deletions apollo-router/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ yaml-rust = "0.4.5"
wsl = "0.1.0"
tokio-rustls = "0.23.4"
http-serde = "1.1.2"
parking_lot = "0.12.1"
memchr = "2.5.0"
brotli = "3.3.4"
zstd = "0.12.3"
Expand All @@ -214,6 +215,9 @@ uname = "0.1.1"
[target.'cfg(unix)'.dependencies]
uname = "0.1.1"

[target.'cfg(target_os = "linux")'.dependencies]
tikv-jemallocator = "0.5"

[dev-dependencies]
ecdsa = { version = "0.15.1", features = ["signing", "pem", "pkcs8"] }
fred = "6.0.0-beta.2"
Expand Down Expand Up @@ -260,3 +264,4 @@ tonic-build = "0.8.4"
[[test]]
name = "integration_tests"
path = "tests/integration_tests.rs"

2 changes: 1 addition & 1 deletion apollo-router/src/axum_factory/axum_http_server_factory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ async fn handle_graphql(
.cloned();

let res = service.oneshot(request).await;
let dur = context.busy_time().await;
let dur = context.busy_time();
let processing_seconds = dur.as_secs_f64();

tracing::info!(histogram.apollo_router_processing_time = processing_seconds,);
Expand Down
153 changes: 153 additions & 0 deletions apollo-router/src/context/extensions.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
// NOTE: this module is taken from tokio's tracing span's extensions
// which is taken from https://github.com/hyperium/http/blob/master/src/extensions.rs

use std::any::Any;
use std::any::TypeId;
use std::collections::HashMap;
use std::fmt;
use std::hash::BuildHasherDefault;
use std::hash::Hasher;

type AnyMap = HashMap<TypeId, Box<dyn Any + Send + Sync>, BuildHasherDefault<IdHasher>>;

// With TypeIds as keys, there's no need to hash them. They are already hashes
// themselves, coming from the compiler. The IdHasher just holds the u64 of
// the TypeId, and then returns it, instead of doing any bit fiddling.
#[derive(Default)]
struct IdHasher(u64);

impl Hasher for IdHasher {
fn write(&mut self, _: &[u8]) {
unreachable!("TypeId calls write_u64");
}

#[inline]
fn write_u64(&mut self, id: u64) {
self.0 = id;
}

#[inline]
fn finish(&self) -> u64 {
self.0
}
}

/// A type map of protocol extensions.
///
/// `Extensions` can be used by `Request` and `Response` to store
/// extra data derived from the underlying protocol.
#[derive(Default)]
pub(crate) struct Extensions {
// If extensions are never used, no need to carry around an empty HashMap.
// That's 3 words. Instead, this is only 1 word.
map: Option<Box<AnyMap>>,
}

#[allow(unused)]
impl Extensions {
/// Create an empty `Extensions`.
#[inline]
pub(crate) fn new() -> Extensions {
Extensions { map: None }
}

/// Insert a type into this `Extensions`.
///
/// If a extension of this type already existed, it will
/// be returned.
pub(crate) fn insert<T: Send + Sync + 'static>(&mut self, val: T) -> Option<T> {
self.map
.get_or_insert_with(Box::default)
.insert(TypeId::of::<T>(), Box::new(val))
.and_then(|boxed| {
(boxed as Box<dyn Any + 'static>)
.downcast()
.ok()
.map(|boxed| *boxed)
})
}

/// Get a reference to a type previously inserted on this `Extensions`.
pub(crate) fn get<T: Send + Sync + 'static>(&self) -> Option<&T> {
self.map
.as_ref()
.and_then(|map| map.get(&TypeId::of::<T>()))
.and_then(|boxed| (&**boxed as &(dyn Any + 'static)).downcast_ref())
}

/// Get a mutable reference to a type previously inserted on this `Extensions`.
pub(crate) fn get_mut<T: Send + Sync + 'static>(&mut self) -> Option<&mut T> {
self.map
.as_mut()
.and_then(|map| map.get_mut(&TypeId::of::<T>()))
.and_then(|boxed| (&mut **boxed as &mut (dyn Any + 'static)).downcast_mut())
}

pub(crate) fn contains_key<T: Send + Sync + 'static>(&self) -> bool {
self.map
.as_ref()
.map(|map| map.contains_key(&TypeId::of::<T>()))
.unwrap_or_default()
}

/// Remove a type from this `Extensions`.
///
/// If a extension of this type existed, it will be returned.
pub(crate) fn remove<T: Send + Sync + 'static>(&mut self) -> Option<T> {
self.map
.as_mut()
.and_then(|map| map.remove(&TypeId::of::<T>()))
.and_then(|boxed| {
(boxed as Box<dyn Any + 'static>)
.downcast()
.ok()
.map(|boxed| *boxed)
})
}

/// Clear the `Extensions` of all inserted extensions.
#[inline]
pub(crate) fn clear(&mut self) {
if let Some(ref mut map) = self.map {
map.clear();
}
}

/// Check whether the extension set is empty or not.
#[inline]
pub(crate) fn is_empty(&self) -> bool {
self.map.as_ref().map_or(true, |map| map.is_empty())
}

/// Get the numer of extensions available.
#[inline]
pub(crate) fn len(&self) -> usize {
self.map.as_ref().map_or(0, |map| map.len())
}
}

impl fmt::Debug for Extensions {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("Extensions").finish()
}
}

#[test]
fn test_extensions() {
#[derive(Debug, PartialEq)]
struct MyType(i32);

let mut extensions = Extensions::new();

extensions.insert(5i32);
extensions.insert(MyType(10));

assert_eq!(extensions.get(), Some(&5i32));
assert_eq!(extensions.get_mut(), Some(&mut 5i32));

assert_eq!(extensions.remove::<i32>(), Some(5i32));
assert!(extensions.get::<i32>().is_none());

assert_eq!(extensions.get::<bool>(), None);
assert_eq!(extensions.get(), Some(&MyType(10)));
}
26 changes: 18 additions & 8 deletions apollo-router/src/context.rs → apollo-router/src/context/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,17 @@ use std::time::Instant;
use dashmap::mapref::multiple::RefMulti;
use dashmap::mapref::multiple::RefMutMulti;
use dashmap::DashMap;
use futures::lock::Mutex;
use derivative::Derivative;
use parking_lot::Mutex;
use serde::Deserialize;
use serde::Serialize;
use tower::BoxError;

use self::extensions::Extensions;
use crate::json_ext::Value;

pub(crate) mod extensions;

/// Holds [`Context`] entries.
pub(crate) type Entries = Arc<DashMap<String, Value>>;

Expand All @@ -31,17 +35,22 @@ pub(crate) type Entries = Arc<DashMap<String, Value>>;
/// [`crate::services::SubgraphResponse`] processing. At such times,
/// plugins should restrict themselves to the [`Context::get`] and [`Context::upsert`]
/// functions to minimise the possibility of mis-sequenced updates.
#[derive(Clone, Debug, Deserialize, Serialize)]
#[derive(Clone, Deserialize, Serialize, Derivative)]
#[derivative(Debug)]
pub struct Context {
// Allows adding custom entries to the context.
entries: Entries,

#[serde(skip, default)]
pub(crate) private_entries: Arc<parking_lot::Mutex<Extensions>>,

/// Creation time
#[serde(skip)]
#[serde(default = "Instant::now")]
pub(crate) created_at: Instant,

#[serde(skip)]
#[derivative(Debug = "ignore")]
busy_timer: Arc<Mutex<BusyTimer>>,
}

Expand All @@ -50,6 +59,7 @@ impl Context {
pub fn new() -> Self {
Context {
entries: Default::default(),
private_entries: Arc::new(parking_lot::Mutex::new(Extensions::default())),
created_at: Instant::now(),
busy_timer: Arc::new(Mutex::new(BusyTimer::new())),
}
Expand Down Expand Up @@ -195,18 +205,18 @@ impl Context {
}

/// Notify the busy timer that we're waiting on a network request
pub(crate) async fn enter_active_request(&self) {
self.busy_timer.lock().await.increment_active_requests()
pub(crate) fn enter_active_request(&self) {
self.busy_timer.lock().increment_active_requests()
}

/// Notify the busy timer that we stopped waiting on a network request
pub(crate) async fn leave_active_request(&self) {
self.busy_timer.lock().await.decrement_active_requests()
pub(crate) fn leave_active_request(&self) {
self.busy_timer.lock().decrement_active_requests()
}

/// How much time was spent working on the request
pub(crate) async fn busy_time(&self) -> Duration {
self.busy_timer.lock().await.current()
pub(crate) fn busy_time(&self) -> Duration {
self.busy_timer.lock().current()
}
}

Expand Down
20 changes: 20 additions & 0 deletions apollo-router/src/json_ext.rs
Original file line number Diff line number Diff line change
Expand Up @@ -577,6 +577,15 @@ pub enum PathElement {
Key(String),
}

#[derive(Clone, Debug, Eq, PartialEq)]
pub enum ResponsePathElement<'a> {
/// An index path element.
Index(usize),

/// A key path element.
Key(&'a str),
}

fn deserialize_flatten<'de, D>(deserializer: D) -> Result<(), D::Error>
where
D: serde::Deserializer<'de>,
Expand Down Expand Up @@ -676,6 +685,17 @@ impl Path {
)
}

pub fn from_response_slice(s: &[ResponsePathElement]) -> Self {
Self(
s.iter()
.map(|x| match x {
ResponsePathElement::Index(index) => PathElement::Index(*index),
ResponsePathElement::Key(s) => PathElement::Key(s.to_string()),
})
.collect(),
)
}

pub fn iter(&self) -> impl Iterator<Item = &PathElement> {
self.0.iter()
}
Expand Down
10 changes: 10 additions & 0 deletions apollo-router/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
//! Main entry point for CLI command to start server.
// Note: We want to use jemalloc on linux, but we don't enable it if dhat-heap is in use because we
// can only have one global allocator
#[cfg(target_os = "linux")]
#[cfg(not(feature = "dhat-heap"))]
use tikv_jemallocator::Jemalloc;

#[cfg(target_os = "linux")]
#[cfg(not(feature = "dhat-heap"))]
#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

fn main() {
match apollo_router::main() {
Expand Down
Loading

0 comments on commit 7e02a1f

Please sign in to comment.