1.4 Inferring an outbreak source location
Detailed case data can be analysed in isolation in an attempt to infer a region in space that
is likely to contain the source of an outbreak. In theory, the recorded movements of each case
define the exact spaces each case has occupied and therefore within that record will be the
exact location(s) at which each case has received an infectious dose of Legionella
bacteria. It is therefore assumed within the analyses described below that the infectious dose
has come from a source within relatively close spatial proximity to the locations in space
occupied by a case.
The approaches described below are all raster-based GIS operations. Initially data for each case is analysed in
isolation to identify those areas in space where a potential source of infection could be
located (for that individual case). Each raster operation can treat case data in two ways:
1. Firstly, it can treat each point in space occupied by a case equally no matter how long that
space has been occupied, when it has been occupied or how many times it been occupied.
2. Secondly, it can weight each point in space in an attempt to assign relative importance to
each location. Weighting might be assigned based on the time spent at a location, the number of
times a location is visited or the date a space was occupied (based on an incubation-period
Having created a raster-grid for each case, attempting to highlight the areas in space that
could potentially contain the outbreak source, the raster's are combined for all cases to form
a composite. The composite raster output is intended to illustrate areas of commonality in
space between all cases that could be suggestive of a source of common infection (and can
therefore be used to target additional investigation). When creating a composite raster it is
important that the relative importance of each of the input case raster's is equal, otherwise
there is potential for a single case to significantly skew the results.
Generally speaking you can assume that more time will have been spent at each 'point' location
visited rather than at each point in space along a travel route (although this may not always
necessarily be true). Assuming the likelihood of becoming infected with Legionnaires' disease
is dose-dependent and that longer periods of time spent within a dose contour-reflect higher
received doses we can normally assume those 'point' locations visited to be the more likely
spaces in which infection has taken place. It may therefore be more sensible to apply the
methods described below, in the first instance, to the point based data describing each
location visited by a case. It is entirely plausible however that in certain scenarios
Legionnaires' disease has been contracted whilst travelling and so the analysis of travel route
data can also, in certain circumstances, produce meaningful results.
When considering the outputs of the analyses described below it is useful to overlay any data
available on potential source locations (e.g. cooling towers, air scrubbers etc) to
identify whether the region inferred by the analysis is within close spatial proximity to a
potential source location.
Considerations for a cross-border outbreak: the analytical methods discussed in this
section rely on individual-based case data rather than aggregated case data. They are therefore
unlikely to be used within a cross-border outbreak scenario where only aggregate data is
Data requirements: Case data
Description: Buffering is the most basic approach that aims to identify areas of
commonality in space between cases. Buffers can be generated around the point locations
visited/travel routes taken by each case in an attempt to describe an area in which we assume
the location of the infectious source could potentially be found. If a grid-based buffering
operation is used we can buffer the locations visited/travel routes taken by each individual
case and then assign each cell within the buffer a value of 1. Combing the output for each case
to form a composite output would see higher counts in the areas that cases had been near to
more commonly. Those areas of commonality could perhaps suggest an area where the source of
infection is more likely to be located. A possible criticism of this methodology however, is
that it assumes any location within the buffer is of equal significance whether it is 1 m or 1
km from the location actually occupied by a case. Figure 1.5 shows an
example output presented from a grid-based buffer analysis.
GIS tools required: A raster-based buffer tool is required for buffering locations
visited/travel routes. A raster-based 'sum' tool is required to create a composite raster grid.
Figure 1.5 Example output from a basic buffer analysis.
1.4.2 'Euclidean' buffer analysis
Data requirements: Case data
Description: A potential improvement upon the basic buffering technique would be to use
a Euclidean-distance measure. In the same way as the basic buffer analysis described in section
1.4.1, the locations visited/travel routes taken by each case are 'buffered' individually;
however the Euclidean distance function can be thought of as type of continuous buffer with
each cell in the raster grid being assigned a value corresponding to its closest distance to
the location visited/travel route taken. Those buffers can then be summed together in the same
way as the basic buffer analysis; however as the Euclidean distance technique assigns lower
values to those cells closer to the locations visited/travel routes taken, it will be those
areas in the composite raster with the lowest values that will be of interest. A potential
criticism of the Euclidean distance function is that it allocates a value to a cell based only
on its closest distance to a point location/travel route, therefore losing any further
information provided by other point locations/travel routes. Figure 1.6
shows an example output presented from a grid-based euclidean distance buffer analysis.
GIS tools required: A raster-based euclidean distance tool is required to perform
Euclidean-distance buffering of locations visited/travel routes. A raster-based 'sum' tool is
required to create a composite raster grid.
Figure 1.6 Example output from a 'Euclidean' buffer analysis
Data requirements: Case data
Description: When analysing the locations visited/travel routes taken by each case,
kernel density analysis can overcome the criticism of the Euclidean distance approach because
instead of using only one distance to allocate a value to a cell, distances to every point
location visited (including each point along a travel routes) can be considered. Essentially a
smoothed surface is produced by the kernel (of a given search radius or 'bandwidth') visiting
each point and weighting the surrounding raster cells based on the distance from that point.
Raster cell values are greatest at the point itself and diminish with distance from that point
until the search radius (or 'bandwidth') is reached where a value of 0 is assigned. The
functional form of the kernel density can be chosen by the GIS users, but Gaussian (normal) form is often used, where the
bandwidth will be dependent on the desired variation. The values of each kernel surface, taken
at each point for an individual case, are then summed together to produce a single smoothed
surface. The output raster surface for each case can then be summed to form a composite. Within
that composite surface it is those areas with the highest values that we can assume are more
likely to be within close proximity of the source of infection.
An additional advantage of using a kernel density function is that features incorporated in the
analysis can be weighted such that certain features have greater influence upon the kernel
surface than others. For example, a higher weight could be given to those locations visited on
the days that infection was more likely to have occurred (this can be inferred based on the
date of symptom onset and the assumed incubation period) or based on the time spent at a
location. Assuming the likelihood of becoming infected with Legionnaires' disease is
dose-dependent and that longer periods of time spent within a dose contour-reflect higher
received doses, weighting the analysis by time spent within a 'cell' could help better predict
a likely source location. By conducting kernel density analysis with and without various
weighting's different patterns may emerge that could add insight to an outbreak investigation.
A potential limitation to the kernel density method, however, is that it requires a search
radius (or 'bandwidth') parameter to be defined which can have a significant impact upon the
resulting output. If the search radius value is too great then the resulting output will be
'over-smoothed' revealing very general trends, whereas a very low value may identify localised
variation but to such an extent that it reveals no new insight above simply visualising the
source data itself. In the case of a Legionnaires' disease outbreak the search radius could
perhaps be seen as the assumed distance at which an outbreak source could potentially infect an
individual. However, due to the differing nature of the various aerosol emitting facilities
that could be responsible for an outbreak, this value is not fixed. It is therefore advisable
to perform any kernel density analyses with varying search radii and to compare the results.
Figure 1.7 provides an example output of a kernel density analysis, and
Figure 1.8 provides an example of a weighted kernel density analysis
based on the same underlying data. The data are weighted based on time spent at each location.
GIS tools required: A raster-based kernel density tool is required to perform kernel
density analysis on the locations visited/travel routes taken. A raster-based 'sum' tool is
required to create a composite raster grid.
Examples from the literature: Kernel density analysis was employed as part of the
response to the South Wales, 2010 Legionnaires' disease outbreak. The analyses identified an
area in the Rhymney Valley in which 2 cooling towers were located.
Figure 1.7 Example output from kernel density analysis.
Figure 1.8 Example output from a weighted kernel density analysis (weighted by time spent at