Outbreak Data Simulation

Outbreak Data Simulation for Syndromic Surveillance

Client: New York City (NYC) Department of Health and Mental Hygiene (DOHMH)


Heightened awareness of the risks of bioterrorism since 9/11 coupled with a growing concern about naturally emerging and reemerging diseases such as West Nile, severe acute respiratory syndrome (SARS), and pandemic influenza have led public health policymakers to realize the need for early warning systems. “Syndromic surveillance” is a new public health tool intended to fill this need. The theory behind syndromic surveillance is that during an attack or a disease outbreak, people will first develop symptoms, then stay home from work or school, attempt to self-treat with over-the-counter products, and eventually see a physician with nonspecific symptoms days before they are formally diagnosed and reported to the health department. To identify such behaviors, syndromic surveillance systems regularly monitor existing data for sudden changes or anomalies that might signal a disease outbreak.

As for all detection techniques, there is a trade-off between sensitivity (the ability to detect an attack when it occurs) and the false-positive rate (the probability of sounding an alarm when there in fact is no attack). The New York City Department of Health and Mental Hygiene was interested in comparing the performance of different syndromic surveillance methods using simulated data. The idea is to “spike” a data stream with a known signal, run detection algorithms as if the data were real, and record whether the signal was detected, and if so, when. This process is repeated multiple times to estimate how the sensitivity—the probability of detection, the false-positive rate, and timeliness—depends on the size, nature, and timing of the signal and other characteristics.


The objectives of this project were:

  1. To prepare a series of data sets for testing syndromic surveillance methods. Data sets had to be based on observed NYC emergency department data from 2004–2012 and include a combination of outbreak types, duration, season and magnitude. Simulated outbreaks covering the following five different syndromes needed to be inserted into NYC emergency department data (January 1, 2010–December 31, 2011): Diarrhea, vomit, fever, respiratory, and influenza-like illness (ILI).
  2. To develop an interface that enables NYC DOHMH to generate new simulated outbreaks as needed.


An approach was developed and implemented into an executable called outbreak-simulation.exe to:

  • compute outbreak magnitudes based on a parameterization of the Serfling base model for five different syndromes, the four seasons and three spatial distributions (Single zip code, zip code cluster, and citywide),
  • generate outbreaks for three different epidemic curves (single-day spike, point-source exposure, and propagated transmission) and three durations (3,5, and 15 days),
  • add the simulated syndrome counts to the actual daily counts recorded for the period 1/1/2010-12/31/2011 accounting for the frequency of visits from ZIP codes to hospitals.

Influenza Syndrome Graph
Daily NUmber of Visits for Influenza-like Illness to NYC Emergency Departments


Goovaerts, P. 2013. Outbreak data simulation for syndromic surveillance. Final report. March 8, 2013.