Legionnaires' disease outbreak investigation toolbox

Download Page

1.2 Outbreak detection

This section describes some advanced statistical techniques which may be used in a surveillance setting in order to facilitate detection of Legionnaires' disease clusters in real-time. The purpose of this section is not to provide an exhaustive 'how-to' for each technique as this would require considerable statistical detail. Rather, we indicate a range of potential techniques and provide references for further information. There are a number of considerations to bear in mind when selecting a technique to implement. For example, more accurate techniques can be more computationally demanding and this may render them impractical for use in real-time. Each technique makes different assumptions of the data and these should also be borne in mind.

Large amounts of data are routinely collected and analysed by population health surveillance systems with a view to early identification of infectious disease outbreaks. Legionnaires' disease clusters present challenges to prospective cluster detection techniques because they are rare events which occur against a varying background count of sporadic cases. Analysis methods which are applied to the detection of Legionnaires' disease clusters need to be sensitive to these features at the same time as searching for the abnormal aggregation of cases in space and time.

Robertson et al. (2010) categorise methods for space-time cluster surveillance into three types. These are statistical tests; model-based approaches; and emerging methods[1]. We discuss each of these in turn.

Statistical tests seek to determine whether disease incidence in a spatially and temporally defined interval is unusual compared to the incidence in the study region as a whole. They include tests for space-time interaction such as the Knox test[2] and the Besag and Newell test[3]. An extension of the Knox test was applied to Legionnaires' disease data by Bhopal et al. (1992)[4]. Kulldorf et al.'s (2005)[5] spatial scan statistic is also the basis of a test which van den Wijngaard et al. (2010)[6] apply to prospective syndromic surveillance for infections including Legionnaires' disease. Traditional CUSUM methods have also been extended to incorporate spatial features[7] and might be used to identify Legionnaires' disease clusters.

Model-based approaches to space-time disease surveillance frequently employ generalised linear mixed models (GLMMs). These are an extension of the generalised linear model (GLM) and work within a regression framework. They use any of the exponential family of statistical distributions in order to model a response variable such as disease counts as a linear function of explanatory variables. This allows adjustments to be made for, for example, season and day of week. It also permits incorporation of random effects which account for varying baseline risks in different geographic areas. Non-linear relationships between a predictive variable and the response variable can be incorporated by transforming the predictor. GLMMs are a flexible modelling tool and might therefore be useful in prospective space-time surveillance for Legionnaires' disease clusters, although to our knowledge this has not yet been attempted.

Another extension to the GLM which might be useful in surveillance for Legionnaires' disease clusters is the generalised additive model (GAM). GAMs can deal with complex relationships between explanatory variables and the response variable and are commonly used to implement non-parametric smoothers in regression models. They are also sometimes used to elicit the shape of response curves that are then fitted with a GLM or GLMM[8].

Other modelling approaches examine spatial patterns of disease by making the assumption that incidence counts in a given area and time period are a realisation from a spatial point process with a given statistical distribution. Rudbeck et al. (2010) assume that the number of Legionnaires' disease cases in small areas in Denmark is described by a Poisson process[9]. Martínez-Beneito et al. (2006) also make a Poisson assumption in their study of source detection for an outbreak of Legionnaires' disease in Spain[10].

More recently developed space-time surveillance methods include agent-based models[11] , bootstrap models[12] and hidden Markov models. HMMs lend themselves to modelling of outbreak and non-outbreak states by assigning different probability distributions to these two states. The likelihood of a particular number of cases being observed is maximised in either the outbreak state, or the non-outbreak state. Movement between the two states from one time period to the next is determined by transition probabilities. In a surveillance setting the probability of being in an outbreak state can then be monitored and calibrated against historical outbreak data to set an exceedance threshold which gives an appropriate false alarm rate. Watkins et al (2009) incorporate spatial structure into this framework by summing reported cases for each postcode area with those of its nearest neighbours[13]. This effectively increases the weighting for cases which occur in neighbouring areas. They find that HMMs provide an effective method for the surveillance of sparse small area incidence data at low false alarm rates.



  1. ROBERTSON C., NELSON T. A., MACNAB Y. C. & LAWSON A. B. (2010) Review of methods for space-time disease surveillance Spatial and Spatio-temporal Epidemiology 1, pp.105-116 http
  2. KNOX G. (1963) Detection of low intensity epidemicity British Journal of Preventive and Social Medicine 17, pp.121-127 http pdf
  3. BESAG J. & NEWELL J. (1991) The detection of clusters in rare diseases Journal of the Royal Statistical Society Series A 154(1), pp.143-145 http
  4. BHOPAL R. S., DIGGLE P. & ROWLINGSON B. (1992) Pinpointing clusters of apparently sporadic cases of Legionnaires' disease British Medical Journal 304 (6833), pp. 1022-1027. http pdf
  5. KULLDORFF M., HEFFERNAN R., HARTMAN J., ASSUNÇÃO, R. & MASTASHARI F. (2005) A space-time permutation scan statistic for disease outbreak detection PLoS Medicine 2(3): e59 http pdf
  6. VAN DEN WIJNGAARD C. C., VAN ASTEN L., VAN PELT W., DOORNBOS G., NAGELKERKE N. J. D., DONKER G. A, VAN DER HOEK W. & KOOPMANS M., P. G. (2010) Syndromic surveillance for local outbreaks of lower-respiratory infections: would it work? PLoS ONE 5(4): e10406 http pdf
  7. ROGERSON P. (2005) A set of associated statistical tests for spatial clustering Environmental and Ecological Statistics 12(3), pp.275-288 http
  8. GUISAN A., ZIMMERMAN N. E. (2000) Predictive habitat distribution models in ecology Ecological Modelling 135 pp.147-186 http
  9. RUDBECK M., JEPSEN M. R., SONNE I. B., ULDUM S. A., VISKUM S. & MØLBAK K. (2010) Geographical variation of sporadic Legionnaires' disease analysed in a grid model Epidemiology and Infection 138 (1), pp. 9-14. http
  10. MARTÍNEZ-BENEITO M. A., ABELLÁN, J. J., LÓPEZ-QUÍLEZ A., VANACLOCHA H., ZURRIAGA O., JORQUES G. & FENOLLAR J. (2006) Source detection in an outbreak of Legionnaires' disease Lecture Notes in Statistics 185(3), pp.162-182 http
  11. EUBANK S., GUCLU H., KUMAR V.S.A., MARATHE M.V., SRINIVASAN A., TOROCZKAI Z. & WANG N. (2004) Modelling disease outbreaks in realistic urban social networks Nature 429(6988), pp. 180-184 http pdf
  12. KIM Y. & O'KELLY M. (2008) A bootstrap based space-time surveillance model with an application to crime occurrences Journal of Geographical Systems 10(2), pp.141-165 http
  13. WATKINS R. E., EAGLESON S., VEENENDAAL B., WRIGHT G. & PLANT A. J. (2009) Disease surveillance using a hidden Markov model BMC Medical Informatics and Decision Making 9(1):39 http pdf