Skip to content

getHostStatus needs thread local storage #7375

@c-taylor

Description

@c-taylor

As parents of a topology are marked down under load, 'HostStatus::getHostStatus' can cause excessive lock behaviour resulting in high system time, reduced output and stats holes.

When performing failure testing: Overloading configured parents causes lock contention on the stats storage.
It was possible to consume almost all ET_NET thread time with a few failing parents and fewer than 5,000 RPS.

Fault replication

Increase load through an edge -> parent configuration until the parents start to fail.
I used connection limits as the failure trigger as it was predictable to fail.

Observations

As parents fail there is an increase in 'HostStatus::getHostStatus' contention, especially when the last parent fails.
This causes a reduction in all 'good' work, errors to clients, content already in cache.

  1. perf traces and flame graphs show near 100% system consumption on lock activity.

getHostStaus_crop

  1. traffic_server metrics stop updating
  2. Response and data rates drop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions