Alias: http://resiliencepapers.club (thanks to John Allspaw).

This file contains notes about people active in resilience engineering, organized alphabetically. I'm using these notes to help get my head around the players and concepts.

You might also be interested in my notes on David Woods's Resilience Engineering short course.

For each person, I list concepts that they reference in their writings, along with some publications. The publications lists aren't comprehensive: they're ones I've read or have added to my to-read list.

John Allspaw
Lisanne Bainbridge
Andrea Baker
Johan Bergström
Todd Conklin
Richard I. Cook
Sidney Dekker
John C. Doyle
Bob Edwards
Anders Ericsson
Meir Finkel
Ivonne Andrade Herrera
Erik Hollnagel
Leila Johannesen
Gary Klein
Nancy Leveson
Anne-Sophie Nyssen
Elinor Ostrom
Jean Pariès
Emily Patterson
Charles Perrow
Shawna J. Perry
Jens Rasmussen
James Reason
Nadine Sarter
James C. Scott
Steven Shorrock
Diane Vaughan
Robert L. Wears
David Woods
John Wreathall

Some big ideas:

The adaptive universe (David Woods)
Dynamic safety model (Jens Rasmussen)
Safety-II (Erik Hollnagel)
Graceful extensibility (David Woods)
ETTO: Efficiency-tradeoff principle (Erik Hollnagel)
Drift into failure (Sidney Dekker)
Robust yet fragile (John C. Doyle)
STAMP: Systems-Theoretic Accident Model & Process (Nancy Leveson)
Polycentric governance (Elinor Ostrom)

John Allspaw

Allspaw is the former CTO of Etsy. He applies concepts from resilience engineering to the tech industry. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.

Allspaw tweets as @allspaw.

Selected publications

Trade-Offs Under Pressure: Heuristics and Observations Of Teams Resolving Internet Service Outages
Etsy Debrief Facilitation Guide
Blameless PostMortems and a Just Culture (blog)
Resilience engineering: learning to embrace failure
Fault Injection in Production: Making the case for resiliency testing

Selected talks

Incidents as we Imagine Them Versus How They Actually Are
Problem detection (papers we love) (presentation of Problem detection paper)
Common Ground and Coordination in Joint Activity (papers we love) (presentation of Common Ground and Coordination in Joint Activity paper)

Lisanne Bainbridge

Bainbridge is (was?) a psychology researcher. (I have not been able to find any recent information about her).

Contributions

Ironies of automation

Bainbridge is famous for her 1983 Ironies of automation paper, which continues to be frequently cited.

Concepts

automation
design errors
human factors/ ergonomics
cognitive modelling
cognitive architecture
mental workload
situation awareness
cognitive error
skill and training
interface design

Selected publications

Ironies of automation

Andrea Baker

Baker is a practitioner who provides training services in human and organizational performance (HOP) and learning teams.

Baker tweets as @thehopmentor.

Concepts

Human and organizational performance (HOP)
Learning teams
Industrial empathy

Selected publications

A bit about HOP (editorial)
A short introduction to human and organizational performance (hop) and learning teams (blog post)

Johan Bergström

Bergström is a safety research and consultant. He runs the Master Program of Human Factors and Systems Safety at Lund University.

Bergström tweets as @bergstrom_johan.

Concepts

Analytical traps in accident investigation
- Counterfactual reasoning
- Normative language
- Mechanistic reasoning

Selected publications

Resilience engineering: Current status of the research and future challenges
Rule- and role retreat: An empirical study of procedures and resilience

Selected talks

Three analytical traps in accident investigation
Two Views on Human Error

Todd Conklin

Conklin's books are on my reading list, but I haven't read anything by him yet. I have listened to his great Preaccident investigation podcast.

Conklin tweets as @preaccident.

Selected publications

Pre-accident investigations: an introduction to organizational safety
Pre-accident investigations: better questions - an applied approach to operational learning

Richard I. Cook

Cook is a medical doctor who studies failures in complex systems. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.

Cook tweets as @ri_cook.

Concepts

complex systems
degraded mode
sharp end (c.f. Reason's blunt end)
Going solid
Cycle of error
"new look"

Selected publications

How complex systems fail
Where complex systems fail
Distancing through differencing: An obstacle to organizational learning following accidents
Being bumpable
Behind Human Error
Incidents - markers of resilience or brittleness?
“Going solid”: a model of system dynamics and consequences for patient safety
Operating at the Sharp End: The Complexity of Human Error
Patient boarding in the emergency department as a symptom of complexity-induced risks
Sensemaking, Safety, and Cooperative Work in the Intensive Care Unit
Medication Reconciliation Is a Window into “Ordinary” Work
Cognitive consequences of clumsy automation on high workload, high consequence human performance
Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)
The Messy Details: Insights From the Study of Technical Work in Healthcare
Nosocomial automation: technology-induced complexity and human performance
The New Look at Error, Safety, and Failure: A Primer for Health Care
Grounding explanations in evolving, diagnostic situations
A Tale of Two Stories: Contrasting Views of Patient Safety

Selected talks

How Complex Systems Fail

Sidney Dekker

Dekker is a human factors and safety researcher with a background in aviation. His books aimed at a lay audience (Drift Into Failure, The Field Guide to 'Human Error' investigations) have been enormously influential. His PhD advisor is David Woods.

Dekker tweets as @sidneydekkercom.

Contributions

Drift into failure

Dekker developed the theory of drift, characterized by five concepts:

Scarcity and competition
Decrementalism, or small steps
Sensitive dependence on initial conditions
Unruly technology
Contribution of the protective structure

Concepts

Drift into failure
Safety differently
New view vs old view of human performance
Just culture
complexity
broken part
Newton-Descartes
diversity
systems theory
unruly technology
decrementalism

Selected publications

Drift into failure
Reconstructing human contributions to accidents: the new view on error and performance
The field guide to 'human error' investigations
Behind Human Error
Rule- and role retreat: An empirical study of procedures and resilience
Anticipating the effects of technological change: A new era of dynamics for human factors

John C. Doyle

Doyle is a control systems researcher. He is seeking to identify the universal laws that capture the behavior of resilient systems, and is concerned with the architecture of such systems.

Concepts

Robust yet fragile
layered architectures
constraints that deconstrain
protocol-based architectures
emergent constraints
Universal laws and arcthitectures
conservation laws
universal architectures
Highly optimized tolerance

Selected publications

Universal Laws and Archiectures (slides)
Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures
Architecture, constraints, and behavior
The “robust yet fragile” nature of the Internet
Highly Optimized Tolerance: Robustness and Design in Complex Systems
Robust efficiency and actuator saturation explain healthy heart rate control and variability

Bob Edwards

Edwards is a practitioner who provides training services in human and organizational performance (HOP).

Edwards tweets as @thehopcoach.

Anders Ericsson

Ericsson introduced the idea of deliberate practice as a mechanism for achieving high level of expertise.

Ericsson isn't directly associated with the field of resilience engineering. However, Gary Klein's work is informed by his, and I have a particular interest in how people improve in expertise, so I'm including him here.

Concepts

Expertise
Deliberate practice
Protocol analysis

Selected publications

Peak: secrets from the new science of expertise
Protocol analysis: verbal reports as data

Meir Finkel

Finkel is a Colonel in the Israeli Defense Force (IDF) and the Director of the IDF's Ground Forces Concept Development and Doctrine Department

Selected publications

On Flexibility: Recovery from Technological and Doctrinal Surprise on the Battlefield

Ivonne Andrade Herrera

Herrera is an associate professor in the department of industrial economics and technology management at NTNU and a senior research scientist at SINTEF. Her areas of expertise include safety management and resilience engineering in avionics and air traffic management.

List of publications

Erik Hollnagel

Contributions

ETTO principle

Hollnagel proposed that there is always a fundamental tradeoff between efficiency and thoroughness, which he called the ETTO principle.

Safety-I vs. Safety-II

Safety-I: avoiding things that go wrong

looking at what goes wrong
bimodal view of work and activities (acceptable vs unacceptable)
find-and-fix approach
prevent transition from 'normal' to 'abnormal'
causality credo: believe that adverse outcomes happen because something goes wrong (they have causes that can be found and treated)
it either works or it doesn't
systems are decomposable
functioning is bimodal

Saefty-II: performance variability rather than bimodality

the system’s ability to succeed under varying conditions, so that the number of intended and acceptable outcomes (in other words, everyday activities) is as high as possible
performance is always variable
performance variation is ubiquitous
things that go right
focus on frequent events
remain sensitive to possibility of failure
be thorough as well as efficient

FRAM

Hollnagel proposed the Functional Resonance Analysis Method (FRAM) for modeling complex socio-technical systems.

Concepts

ETTO (efficiency thoroughness tradeoff) principle
FRAM (functional resonance analysis method)
Safety-I and Safety-II
things that go wrong vs things that go right
causality credo
performance variability
bimodality
emergence
work-as-imagined vs. work-as-done
joint cognitive systems

Selected publications

The ETTO Principle: Efficiency-Thoroughness Trade-Off: Why Things That Go Right Sometimes Go Wrong
From Safety-I to Safety-II: A White Paper
Safety-II in Practice
Safety-I and Safety-II: The past and future of safety management
FRAM: The Functional Resonance Analysis Method: Modelling Complex Socio-technical System
Joint Cognitive Systems: Patterns in Cognitive Systems Engineering
Resilience Engineering: Concepts and Precepts
I want to believe: some myths about the management of industrial safety
Resilience engineering – Building a Culture of Resilience (slides)

Leila Johannesen

Johannesen is currently a UX researcher and community advocate at IBM. Her PhD dissertation work examined how humans cooperate, including studies of anesthesiologists.

Concepts

common ground

Selected publications

Grounding explanations in evolving, diagnostic situations
Maintaining common ground: an analysis of cooperative communication in the operating room

Gary Klein

Klein studies how experts are able to quickly make effective decisions in high-tempo situations.

Klein tweets as @KleInsight.

Concepts

naturalistic decision making (NDM)
intuitive expertise
cognitive task analysis
common ground
problem detection
automation as a "team player"

Selected publications

Sources of power: how people make decisions
Working minds: a practitioner's guide to cognitive task analysis
Patterns in Cooperative Cognition
Common Ground and Coordination in Joint Activity
Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches
Conditions for intuitive expertise: a failure to disagree
Problem detection
Ten challenges for making automation a team player

Nancy Leveson

Nancy Leveson is a computer science researcher with a focus in software safety.

Contributions

STAMP

Leveson developed the accident causality model known as STAMP: the Systems-Theoretic Accident Model and Process.

See STAMP for some more detailed notes of mine.

Concepts

Software safety
STAMP (systems-theoretic accident model and processes)
STPA (system-theoretic process analysis) hazard analysis technique
CAST (causal analysis based on STAMP) accident analysis technique
Systems thinking
hazard
interactivy complexity
system accident
dysfunctional interactions
safety constraints
control structure
dead time
time constants
feedback delays

Selected publications

A New Accident Model for Engineering Safer Systems
Engineering a safer world
STPA Handbook
Safeware
Resilience Engineering: Concepts and Precepts
High-pressure steam engines and computer software
Resilience Engineering: Concepts and Precepts

Anne-Sophie Nyssen

Nyssen is a psychology professor at the University of Liège, who does research on human error in complex systems, in particular in medicine.

A list of publications can be found on her website linked above.

Elinor Ostrom

Ostrom was a Nobel-prize winning economics and political science researcher.

Selected publications

Coping with tragedies of the commons
Governing the Commons: The Evolution of Institutions for Collective Action

Concepts

tragedy of the commons
polycentric governance
social-ecological system framework

Jean Pariès

Pariès is the president of Dédale, a safety and human factors consultancy.

Selected publications

Resilience engineering in practice: a guidebook

Emily Patterson

Patterson is a researcher who applies human factors engineering to improve patient safety in healthcare.

Selected publications

Patient boarding in the emergency department as a symptom of complexity-induced risks

Charles Perrow

Perrow is a sociologist who studied the Three Mile Island disaster.

Concepts

Normal accidents
Common-mode

Selected publications

Normal accidents: living with high-risk technologies

Shawna J. Perry

Perry is a medical researcher who studies emergency medicine.

Concepts

Underground adaptations
Articulated functions vs. important functions
Unintended effects
Apparent success vs real success
Exceptions
Dynamic environments

Selected publications

Underground adaptations: case studies from health care
Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches

Jens Rasmussen

Jens Rasmussen was a very influential researcher in human factors and safety systems.

Contributions

Skill-rule-knowledge (SKR) model

TBD

Dynamic safety model

Rasmussen proposed a state-based model of a socio-technical system as a system that moves within a region of a state space. The region is surrounded by different boundaries:

economic failure
unacceptable work load
functionality acceptable performance

Source: Risk management in a dynamic society: a modelling problem

Incentives push the system towards the boundary of acceptable performance: accidents happen when the boundary is exceeded.

AcciMaps

TBD

Risk management framework

Rasmussen proposed a multi-layer view of socio-technical systems:

Source: Risk management in a dynamic society: a modelling problem

Concepts

Dynamic safety model
Migration toward accidents
Risk maangement framework
Boundaries:
- boundary of functionally acceptable performance
- boundary to economic failure
- boundary to unnaceptable work load
Cognitive systems engineering
Skill-rule-knowledge (SKR) model
AcciMaps
Means-ends hierarchy
Ecological interface design
Systems approach
Control-theoretic
decisions, acts, and errors
hazard source
anatomy of accidents
energy
systems thinking
trial and error experiments
defence in depth (fallacy)
Role of managers
- Information
- Competency
- Awareness
- Commitment
Going solid

Selected publications

Reflecting on Jens Rasmussen’s legacy. A strong program for a hard problem (my notes)
Risk management in a dynamic society: a modelling problem
Coping with complexity
“Going solid”: a model of system dynamics and consequences for patient safety
Human error and the problem of causality in analysis of accidents

James Reason

Reason is a psychology researcher who did work on understanding and categorizing human error.

Contributions

Accident causation model (Swiss cheese model)

Reason developed an accident casuation model that is sometimes known as the swiss cheese model of accidents. In this model, Reason introduced the terms "sharp end" and "blunt end".

Human Error model: Slips, laspses and mistakes

Reason developed a model of the types of errors that humans make:

slips
lapses
mistakes

Concepts

Blunt end
Human error
Slips, lapses and mistakes
Swiss cheese model

Selected publications

Human error

Nadine Sarter

Sarter is a researcher in industrial and operations engineering. She is the director of the Center for Ergonomics at the University of Michigan.

Concepts

cognitive ergonomics
organization safety
human-automation/robot interaction
human error / error management
attention / interruption maangement
design of decision support systems

Selected publications

Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation
Behind Human Error
Designed-Induced Error and Error-Informed Design: A Two-Way Street

Robert L. Wears

Wears was a medical researcher who studied emergency medicine.

Concepts

Underground adaptations
Articulated functions vs. important functions
Unintended effects
Apparent success vs real success
Exceptions
Dynamic environments
Systems of care are intrinsically hazardous

Selected publications

The error of counting "errors"
Underground adaptations: case studies from health care
Fundamental On Situational Surprise: A Case Study With Implications For Resilience
Replacing Hindsight With Insight: Toward Better Understanding of Diagnostic Failures
Seeing patient safety ‘Like a State’

James C. Scott

Scott is an anthropologist who also does research in political science. While Scott is not a member of a resilience engineering community, his book Seeing like a state has long been a staple of the cognitive systems engineering and resilience engineering communities.

Concepts

authoritarian high-modernism
legibility
mētis

Selected publications

Seeing like a state: how certain schemes to improve the human condition have failed

Steven Shorrock

Shorrock is a chartered psychologist and a chartered ergonomist and human factors specialist. He is the editor-in-chief of EUROCONTROL HindSight magazine. He runs the excellent Humanistic Systems blog.

Shorrock tweets as @StevenShorrock.

Human Factors and Ergonomics in Practice: Improving System Performance and Human Well-Being in the Real World (book)

Diane Vaughan

Vaughan is a sociology researcher who did a famous study of the NASA Challenger accident.

Concepts

normalization of deviance

Selected publications

The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA

David Woods

Woods has a resesarch background in cognitive systems engineering and did work researching NASA accidents. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.

Woods tweets as @ddwoods2.

Contributions

Woods has contributed an enormous number of concepts.

The adaptive universe

Woods uses the adaptive universe as a lens for understanding the behavior of all different kinds of systems.

All systems exist in a dynamic environment, and must adapt to change.

A successful system will need to adapt by virtue of its success.

Systems can be viewed as units of adaptive behavior (UAB) that interact. UABs exist at different scales (e.g., cell, organ, individual, group, organization).

All systems have competence envelopes, which are constrained by boundaries.

The resilience of a system is determined by how it behaves when it comes near to a boundary.

See Resilience Engineering Short Course for more details.

Charting adaptive cycles

Trigger
Units of adaptive behavior
Goals and goal conflicts
Pressure points
Subcycles

Graceful extensibility

From The theory of graceful extensibility: basic rules that govern adaptive systems:

(Longer wording)

Adaptive capacity is finite
Events will produce demands that challenge boundaries on the adaptive capacity of any UAB
Adaptive capacities are regulated to manage the risk of saturating CfM
No UAB can have sufficient ability to regulate CfM to manage the risk of saturation alone
Some UABs monitor and regulate the CfM of other UABs in response to changes in the risk of saturation
Adaptive capacity is the potential for adjusting patterns of action to handle future situations, events, opportunities and disruptions
Performance of a UAB as it approaches saturation is different from the perforamnce of that UAB when it operates far from saturation
All UABs are local
There are bounds on the perspective any UAB, but these limits are overcome by shifts and contrasts over multiple perspectives.
Reflective systems risk mis-calbiration

(Shorter wording)

Boundaries are universal
Surprise occurs, continuously
Risk of saturation is monitored and regulated
Synchronization across multiple units of adaptive behavior in a network is necessary
Risk of saturation can be shared
Pressure changes what is sacrificed when
Pressure for optimality undermines graceful extensibility
All adaptive units are local
Perspective contrast overcomes bounds
Mis-calibration is the norm

Concepts

Many of these are mentioned in Woods's short course.

the adaptive universe
unit of adaptive behavior (UAB), adaptive unit
adapative capacity
continuous adaptation
graceful extensibility
sustained adaptability
Tangled, layered networks (TLN)
competence envelope
adaptive cycles/histories
precarious present (unease)
resilient future
tradeoffs, five fundamental
florescence: the degree that changes in one area tend to recruit or open up beneficial changes in many other aspects of the network - which opens new opportunities across the network ...
reverberation
adaptive stalls
borderlands
anticipate
synchronize
proactive learning
initiative
reciprocity
SNAFUs
robustness
surprise
dynamic fault management
software systems as "team players"
multi-scale
brittleness
decompensation
working at cross-purposes
proactive learning vs getting stuck
oversimplification
fixation
fluency law, veil of fluency
capacity for maneuver (CfM)
crunches
sharp end, blunt end
adaptive landscapes
law of stretched systems: Every system is continuously stretched to operate at capacity.
cascades
adapt how to adapt
unit working hard to stay in control
you can monitor how hard you're working to stay in control (monitor risk of saturation)
reality trumps algorithms
stand down
time matters
Properties of resilient organizations
- Tangible experience with surprise
- uneasy about the precarious present
- push intiative down
- reciprocity
- align goals across multiple units
goal conflicts, goal interactions (follow them!)
to understand system, must study it under load
adaptive races are unstable
adaptive traps
roles, nesting of
hidden interdependencies
net adaptive value
matching tempos
tilt toward florescence
linear simplification
common ground
problem detection
joint cognitive systems
automation as a "team player"
"new look"

Selected publications

Resilience Engineering: Concepts and Precepts
Resilience is a verb
Four concepts for resilience and the implications for the future of resilience engineering
How adaptive systems fail
Resilience and the ability to anticipate
Distancing through differencing: An obstacle to organizational learning following accidents
Essential characteristics of resilience
Learning from Automation Surprises and "Going Sour" Accidents: Progress on Human-Centered Automation
Behind Human Error
Joint Cognitive Systems: Patterns in Cognitive Systems Engineering
Patterns in Cooperative Cognition
Origins of cognitive systems engineering
Incidents - markers of resilience or brittleness?
The alarm problem and directed attention in dynamic fault management
Can We Trust Best Practices? Six Cognitive Challenges of Evidence-Based Approaches
Operating at the Sharp End: The Complexity of Human Error
The theory of graceful extensibility: basic rules that govern adaptive systems
Simon's Slice: Five Fundamental Tradeoffs that Bound the Performance of Human Work Systems
Anticipating the effects of technological change: A new era of dynamics for human factors
Common Ground and Coordination in Joint Activity
Resilience as Graceful Extensibility to Overcome Brittleness
Resilience Engineering: Redefining the Culture of Safety and Risk Management
Problem detection
Cognitive consequences of clumsy automation on high workload, high consequence human performance
Implications of automation surprises in aviation for the future of total intravenous anesthesia (TIVA)
Ten challenges for making automation a team player
The Messy Details: Insights From the Study of Technical Work in Healthcare
Nosocomial automation: technology-induced complexity and human performance
Human-centered software agents: Lessons from clumsy automation
STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity
The New Look at Error, Safety, and Failure: A Primer for Health Care
Grounding explanations in evolving, diagnostic situations
Resilience Engineering: Concepts and Precepts
A Tale of Two Stories: Contrasting Views of Patient Safety

Selected talks

The Mystery of Sustained Adaptability

John Wreathall

Wreathall is an expert in human performance in safety. He works at the WreathWood Group, a risk and safety studies consultancy.

Wreathall tweets as @wreathall.

Selected publications

Resilience engineering in practice: a guidebook

Files

README.md

Latest commit

History

README.md

File metadata and controls

John Allspaw

Selected publications

Selected talks

Lisanne Bainbridge

Contributions

Ironies of automation

Concepts

Selected publications

Andrea Baker

Concepts

Selected publications

Johan Bergström

Concepts

Selected publications

Selected talks

Todd Conklin

Selected publications

Richard I. Cook

Concepts

Selected publications

Selected talks

Sidney Dekker

Contributions

Drift into failure

Concepts

Selected publications

John C. Doyle

Concepts

Selected publications

Bob Edwards

Anders Ericsson

Concepts

Selected publications

Meir Finkel

Selected publications

Ivonne Andrade Herrera

Erik Hollnagel

Contributions

ETTO principle

Safety-I vs. Safety-II

FRAM

Concepts

Selected publications

Leila Johannesen

Concepts

Selected publications

Gary Klein

Concepts

Selected publications

Nancy Leveson

Contributions

STAMP

Concepts

Selected publications

Anne-Sophie Nyssen

Elinor Ostrom

Selected publications

Concepts

Jean Pariès

Selected publications

Emily Patterson

Selected publications

Charles Perrow

Concepts

Selected publications

Shawna J. Perry

Concepts

Selected publications

Jens Rasmussen

Contributions

Skill-rule-knowledge (SKR) model

Dynamic safety model

AcciMaps

Risk management framework