What it's for

Public health case definitions often take the form of predictive checklists. The WHO, for example, defines influenza-like illness (ILI) as an acute respistoray infection with fever, cough, and an onset in the past 10 days; and the CDC defines a probable case of pertussis (whooping cough) as the presence of paroxysms of coughing, inspiratory whoop, post-coughing vomiting, or apnea for at least 2 weeks (or fewer than 2 weeks with exposure to a known case. Kudos is a Python package that helps you develop and test these kinds of case definitions using combinatorial optimization.

Who it's for

Kudos was written with epidemiologists, biostatisticians, data scientists, and other data-savvy public health practitioners in mind. That being said, the code is subject-matter-agnostic, and so it can be used by anyone looking to build high-performance predictive checklists.

How it works

Kudos use three kinds of combinatorial optimization methods to develop case definitions: linear programming (1); nonlinear programming; and brute-force search (2, 3). The first two methods are good for quickly finding a near-optimal definition based on your data, and the third method is good for exploring the full range of possible definitions. All of them figure out which combination of predictors (often symptoms) has the best classification performance relative to the reference standard you've specified (often a pathogen-specific like test like PCR or viral culture).

Getting Started


The easiest way to install Kudos is with pip:

pip install kudos

The package is available on PyPI, so you can also use any standard package manager to fetch the code and handle the installation. If you'd like to contribute, fork the package first, clone it, and then install the dependencies manually.

git clone
cd kudos
pip install -r requirements.txt

Software requirements

The package was written in Python 3.8, and because of some recent-ish changes to the multiprocessing package, it will not run on anything lower. It requires a few standard dependencies, like numpy, scikit-learn, and seaborn, but it will check for those during installation and add them if they're missing.


Kudos is best run on a scientific workstation or cloud instance with a decent amount of RAM and lots of processors. If you're using something less substantial, the optimizers will still work, but you may need to whittle down your dataset first if it has a large number of predictors. Regardless of hardware, the FullEnumeration (i.e., brute-force search) can take a long time to run, so keep that in mind when setting up the optimization.

Using Kudos


Kudos is designed to be used interactively. Let's say you have a dataset named data with an outcome y and some number of predictors X. Finding a good case definition is as easy as fitting one of the models in the optimizers module.

import pandas as pd
from kudos import optimizers

data = pd.read_csv('data.csv')
X = data[X_columns]
y = data[y_column]

ip = optimizers.IntegerProgram(), y)

Once the solver finishes, it saves the optimal definition in the results attribute.


Seeing who meets the case definition in a new batch of data is just as easy.

new_data = pd.read_csv('new_data.csv')
new_X = new_data[X_columns]
meets_definition = ip.predict(new_X)

The other optimizers, FullEnumeration and NonlinearApproximation, have the same functionality, and FullEnumeration also lets you do some visualizations with the candidate case definitions. For more info, see the demo notebook.


Coming soon.


Coming soon.

Frequently Asked Questions


  1. How do the linear programs decide which case definition is the best? The IntegerProgram needs a linear objective function to run, meaning it's limited to metrics that are linear combinations of the predictors and the candidate case definitions. Youden's J index (sensitivity + specificity - 1) meets that criterion, and it's a reasonable measure of overall classification performance, so that's what we use. Because it's a relaxed version of the integer program, the NonlinearApproximation uses this metric, as well.

  2. What if I care more about sensitivity than specificity, or vice versa? You can change how much weight each component of the J index receives by altering the with the alpha (sensitivity) and beta (specificity) parameters of the linear program you .fit().

  3. What about the full enumeration? If you want to optimize a different metric than the J index, you can use the FullEnumeration instead of the LP-based optimizers. It will accept F-score or Matthews correlation coefficient (MCC), in addition to J, as targets for sorting, pruning, and plotting.


  1. How can I make the brute-force search run faster?
    • Whittle down your feature space. The optimizers.FeaturePruner is one way to do that, but standard variable-selection procedures (e.g., forward or backward selection) will also work.
    • Try a lower value for max_n. The default is 5, which should work well in most cases.
    • Turn off use_reverse, if it's on. Using it doubles the size of the feature space.
    • Turn off compound, if it's on. Using it substantially increases the number of combinations to try.
  2. How can I make the brute-force search use less memory?
    • Set share_memory to True when you initialize the FullEnumeration object. This keeps multiprocessing from passing copies of the dataset to every process in the Pool.
    • Make sure prune is turned on. This limits the number of combinations saved at each step in the search.
    • Use a smaller number for batch_keep_n. This decides how many combos to save when prune is turned on.
    • Use fewer predictors. See the first answer to question #1 above.
  3. How can I make the linear program run faster?
    • Try a lower value for max_n. The default is None, which will take the longest.
    • Try a different solver. OR-Tools, which is what Kudos uses on the backend, has a few options available.


  1. Zhang H, Morris Q, Ustun B, Ghassemi M. Learning optimal predictive checklists. Advances in Neural Information Processing Systems. 2021 Dec 6;34:1215-29.
  2. Reses HE, Fajans M, Lee SH, Heilig CM, Chu VT, Thornburg NJ, Christensen K, Bhattacharyya S, Fry A, Hall AJ, Tate JE. Performance of existing and novel surveillance case definitions for COVID-19 in household contacts of PCR-confirmed COVID-19. BMC public health. 2021 Dec;21(1):1-5.
  3. Lee S, Almendares O, Prince-Guerra JL, Heilig CM, Tate JE, Kirking HL. Performance of Existing and Novel Symptom-and Antigen Testing-Based COVID-19 Case Definitions in a Community Setting. medRxiv. 2022 Jan 1.

General disclaimer This repository was created for use by CDC programs to collaborate on public health related projects in support of the CDC mission. Github is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software.

