2.1 Case Data
Case data are fundamentally important to the most basic, and more advanced, GIS-based
investigation aiming to identify the likely source of a Legionnaires' disease outbreak.
Typically one initially assumes that a common source is responsible for a local increase in
Legionnaires' disease cases, so it is also reasonable to assume that those infected with
Legionnaires' disease have been in relatively close spatial proximity to that source. The
incubation period of Legionnaires' disease is an unknown and may vary, but it is broadly
accepted that it could be anywhere between 2-14 days. It is therefore important in the case of
Legionnaires' disease to collect data for that period of time prior to a case's onset of
symptoms (examples in the literature extend up to 14 days). The following case data should be
collected where possible for use within GIS-based analysis:
2.1.1 Locations visited - as a minimum requirement these should
include home and work locations as well as any other locations visited during the period of
time prior to the onset of symptoms. Where possible, details should also be collected on the
time spent at each location. Example schema
The locations visited by each case (and the routes travelled) are important because at a
particular location in space each case will have come into contact with the Legionella
bacteria and contracted Legionnaires' disease. By identifying areas of commonality in space
between the outbreak cases it is possible to gain insight into the spatial location of a
potential source.
Locations collected as part of a case questionnaire are likely to be in the form of street
addresses or perhaps postal or ZIP codes, rather than spatial features that are more readily
accepted by a GIS. An
operation known as geocoding is commonly used to convert addresses, stored as text, into
spatial data by referencing a pre-existing data layer containing address information. In the
absence of automated geocoding functionality, gazetteer's (essentially a geographical
dictionary linking addresses to other information) can be used to search for locations and be
plotted manually into a GIS
data layer. However location based data are collected it is important to record the method of
data capture within the attribution.
In addition to the geographic location it is also useful to record the time spent at each
location. Assuming that Legionnaire's disease is dose-dependent and that larger amounts of time
spent within a dose-contour reflect higher received doses, it is reasonable to use 'time' as a
weighting parameter in investigatory analytical methods such as the kernel density analysis
described in section 1.4.3.
2.1.2 Travel
Routes - where collection is possible these should include the routes taken
between each of the locations visited. The methods of transport can also be collected. Example schema
It is entirely possible that Legionnaires' disease will have been contracted whilst travelling
between two locations, rather than at a case's home or work place so it is important to collect
this information if achievable. It is also sensible to record the method of transport used as
that could have influenced contact with contaminated air.
Travel routes may be slightly more problematic to collect than the point locations visited.
Unless those collecting the information can present a map to each case and be told the precise
travel routes that have been taken, the next best step is to infer them. Network analysis can
be used to calculate the 'least cost path' between two locations along a topologically
structured road network data layer. A number of cross European routable road networks are
available, such as supplied by TeleAtlas and Navtech - and 'within country' networks, such as
Integrated Transport Network (ITN) supplied by Great Britain's Ordnance Survey.
It should be understood, however that network analysis makes a rather large assumption that
each case has actually taken the quickest (least cost) route. The appropriateness of inferring
travel routes may vary depending on the environment in which an outbreak occurs. For example,
in an area with a dense network of roads, where there are a variety of routes that could be
taken, inferring travel routes could result in a large degree of error. Conversely, in an area
with a low density road network, where a case can perhaps only have one realistic route of
travel between two locations, then the technique could be of more value. Mode of transport
should also be considered when making a decision whether or not to infer travel routes. For
example, in an inner city with an underground train network, it may be more likely for a case
to travel by train from point A to B rather than by road. The rules used to create the inferred
path must be explicitly stated and understood/communicated to those receiving the information,
with override options if local expert knowledge is available. Inferring travel routes using
network analysis is a potentially very useful resource; however an understanding of when to
apply the technique is essential. It is also important to record the method of data capture
within the attribution so that it is clearly understood by those working with the data.
Considerations for cross border outbreak: an outbreak local to an individual country
should normally have access to individual-based case data (e.g. home location, locations
visited and travel routes). In a cross border outbreak, due to legal restrictions, it may not
be possible to share individual-based case data. In these situations case data should be
aggregated to small-area administrative units used across European states, such as the 'Local
Administrative Unit 2' (LAU2, formerly known as NUTS5). For each small-area, attribution should
include the number of cases who are resident within that area, the number of locations visited
by cases in that area, the number of times cases have travelled through that area, the number
of the total cases in the outbreak that have been within that area, and a resident population
figure for the administrative unit (if available). As well as including 'total' values it may
also be beneficial to include values for each date being considered in the outbreak to allow
the data to be viewed in time-series. A cut-down version of individual-based case data should
also be possible to share - removing the coordinate locations of case home locations, work
locations and other locations visited, and replacing them instead with the small-area geography
code. By doing this the disclosive nature of point-based data would be removed but data could
still be looked at in time series in relation to where, approximately (i.e. with a
small-area geographic unit), each case had been over the time period up to the onset of
symptoms.
For several of the analyses described in section 1. Analytical methods for
outbreak investigation aggregate data would not be suitable; however analyses can take
place using individual-based case data, either side of a border, and the outputs combined
(see section 1.3).
EXAMPLE
SCHEMAS
Case Schema
This data is tabular in nature (not spatial) and contains generic data about each 'case'. The
attribution provided in red should be considered as the minimum
requirement. The attribution provided in green should be
considered as desirable additions.
DATA FIELD
|
DESCRIPTION
|
CASE_ID (PK)
|
A unique identifier given to each case within the
analysis
|
ONSET_DATE
|
The date of onset of symptoms
|
LD_TYPING
|
The cases Legionnaires' disease sequence-based type
|
AGE
|
The age of the patient
|
Case Locations Schema
The attribution provided in red should be considered as the
minimum requirement. The attribution provided in green
should be considered as desirable additions.
DATA FIELD
|
DESCRIPTION
|
GEOMETRY
|
The geometric point representing the location
|
LOCATION_ID (PK)
|
A unique identifier for each location
|
CASE_ID (FK)
|
A unique identifier given to each case within the
analysis
|
X_COORD
|
X coordinate of the location visited
|
Y_COORD
|
Y coordinate of the location visited
|
ADMIN_CODE
|
Administrative code of the small-area geographical unit the
location falls within (e.g. a LAU2 code)
|
LOCATION_TYPE
|
The type of location visited (e.g. home, work,
supermarket etc)
|
DATE
|
The date the location was visited
|
TIME
|
The approximate time the location was visited
|
DURATION
|
The approximate duration of stay at the location
|
DATA_CAPTURE
|
The method used for capturing the data
|
Travel Routes Schema
The attribution provided in red should be considered as the
minimum requirement. The attribution provided in green
should be considered as desirable additions.
DATA FIELD
|
DESCRIPTION
|
GEOMETRY
|
The geometric polyline representing the travel route
|
ROUTE_ID (PK)
|
A unique identifier given to each route within the
analysis
|
CASE_ID (FK)
|
A unique identifier given to each case within the
analysis
|
ORGIN
|
The LOCATION_ID of the starting point of the journey
|
DESTINATION
|
The LOCATION_ID of the end point of the journey
|
TRANS_TYPE
|
The transportation method used (e.g. car, train,
walk etc.)
|
DATE
|
The date the travel took place
|
TIME
|
The and approximate time the travel took place
|
DATA_CAPTURE
|
The method used for capturing the data
|