1.2 Outbreak detection
This section describes some advanced statistical techniques which may be used in a surveillance
setting in order to facilitate detection of Legionnaires' disease clusters in real-time. The
purpose of this section is not to provide an exhaustive 'how-to' for each technique as this
would require considerable statistical detail. Rather, we indicate a range of potential
techniques and provide references for further information. There are a number of considerations
to bear in mind when selecting a technique to implement. For example, more accurate techniques
can be more computationally demanding and this may render them impractical for use in
real-time. Each technique makes different assumptions of the data and these should also be
borne in mind.
Large amounts of data are routinely collected and analysed by population health surveillance
systems with a view to early identification of infectious disease outbreaks. Legionnaires'
disease clusters present challenges to prospective cluster detection techniques because they
are rare events which occur against a varying background count of sporadic cases. Analysis
methods which are applied to the detection of Legionnaires' disease clusters need to be
sensitive to these features at the same time as searching for the abnormal aggregation of cases
in space and time.
Robertson et al. (2010) categorise methods for space-time cluster surveillance into
three types. These are statistical tests; model-based approaches; and emerging methods[1]. We
discuss each of these in turn.
Statistical tests seek to determine whether disease incidence in a spatially and temporally
defined interval is unusual compared to the incidence in the study region as a whole. They
include tests for space-time interaction such as the Knox test[2] and the Besag and Newell
test[3]. An extension of the Knox test was applied to Legionnaires' disease data by Bhopal
et al. (1992)[4]. Kulldorf et al.'s (2005)[5] spatial scan statistic is also the
basis of a test which van den Wijngaard et al. (2010)[6] apply to prospective syndromic
surveillance for infections including Legionnaires' disease. Traditional CUSUM methods have also been extended to
incorporate spatial features[7] and might be used to identify Legionnaires' disease clusters.
Model-based approaches to space-time disease surveillance frequently employ generalised linear
mixed models (GLMMs). These
are an extension of the generalised linear model (GLM) and work within a regression framework. They use any of the
exponential family of statistical distributions in order to model a response variable such as
disease counts as a linear function of explanatory variables. This allows adjustments to be
made for, for example, season and day of week. It also permits incorporation of random effects
which account for varying baseline risks in different geographic areas. Non-linear
relationships between a predictive variable and the response variable can be incorporated by
transforming the predictor. GLMMs are a flexible modelling tool and might therefore be
useful in prospective space-time surveillance for Legionnaires' disease clusters, although to
our knowledge this has not yet been attempted.
Another extension to the GLM
which might be useful in surveillance for Legionnaires' disease clusters is the generalised
additive model (GAM).
GAMs can deal with complex
relationships between explanatory variables and the response variable and are commonly used to
implement non-parametric smoothers in regression models. They are also sometimes used to elicit
the shape of response curves that are then fitted with a GLM or GLMM[8].
Other modelling approaches examine spatial patterns of disease by making the assumption that
incidence counts in a given area and time period are a realisation from a spatial point process
with a given statistical distribution. Rudbeck et al. (2010) assume that the number of
Legionnaires' disease cases in small areas in Denmark is described by a Poisson process[9].
Martínez-Beneito et al. (2006) also make a Poisson assumption in their study of
source detection for an outbreak of Legionnaires' disease in Spain[10].
More recently developed space-time surveillance methods include agent-based models[11] ,
bootstrap models[12] and hidden Markov models. HMMs lend themselves to modelling of outbreak and non-outbreak
states by assigning different probability distributions to these two states. The likelihood of
a particular number of cases being observed is maximised in either the outbreak state, or the
non-outbreak state. Movement between the two states from one time period to the next is
determined by transition probabilities. In a surveillance setting the probability of being in
an outbreak state can then be monitored and calibrated against historical outbreak data to set
an exceedance threshold which gives an appropriate false alarm rate. Watkins et al
(2009) incorporate spatial structure into this framework by summing reported cases for each
postcode area with those of its nearest neighbours[13]. This effectively increases the
weighting for cases which occur in neighbouring areas. They find that HMMs provide an effective method for the
surveillance of sparse small area incidence data at low false alarm rates.