2.1 Case Data

Case data are fundamentally important to the most basic, and more advanced, GIS-based investigation aiming to identify the likely source of a Legionnaires' disease outbreak. Typically one initially assumes that a common source is responsible for a local increase in Legionnaires' disease cases, so it is also reasonable to assume that those infected with Legionnaires' disease have been in relatively close spatial proximity to that source. The incubation period of Legionnaires' disease is an unknown and may vary, but it is broadly accepted that it could be anywhere between 2-14 days. It is therefore important in the case of Legionnaires' disease to collect data for that period of time prior to a case's onset of symptoms (examples in the literature extend up to 14 days). The following case data should be collected where possible for use within GIS-based analysis:

2.1.1 Locations visited - as a minimum requirement these should include home and work locations as well as any other locations visited during the period of time prior to the onset of symptoms. Where possible, details should also be collected on the time spent at each location. Example schema

The locations visited by each case (and the routes travelled) are important because at a particular location in space each case will have come into contact with the Legionella bacteria and contracted Legionnaires' disease. By identifying areas of commonality in space between the outbreak cases it is possible to gain insight into the spatial location of a potential source.

Locations collected as part of a case questionnaire are likely to be in the form of street addresses or perhaps postal or ZIP codes, rather than spatial features that are more readily accepted by a GIS. An operation known as geocoding is commonly used to convert addresses, stored as text, into spatial data by referencing a pre-existing data layer containing address information. In the absence of automated geocoding functionality, gazetteer's (essentially a geographical dictionary linking addresses to other information) can be used to search for locations and be plotted manually into a GIS data layer. However location based data are collected it is important to record the method of data capture within the attribution.

In addition to the geographic location it is also useful to record the time spent at each location. Assuming that Legionnaire's disease is dose-dependent and that larger amounts of time spent within a dose-contour reflect higher received doses, it is reasonable to use 'time' as a weighting parameter in investigatory analytical methods such as the kernel density analysis described in section 1.4.3.

2.1.2 Travel Routes - where collection is possible these should include the routes taken between each of the locations visited. The methods of transport can also be collected. Example schema

It is entirely possible that Legionnaires' disease will have been contracted whilst travelling between two locations, rather than at a case's home or work place so it is important to collect this information if achievable. It is also sensible to record the method of transport used as that could have influenced contact with contaminated air.

Travel routes may be slightly more problematic to collect than the point locations visited. Unless those collecting the information can present a map to each case and be told the precise travel routes that have been taken, the next best step is to infer them. Network analysis can be used to calculate the 'least cost path' between two locations along a topologically structured road network data layer. A number of cross European routable road networks are available, such as supplied by TeleAtlas and Navtech - and 'within country' networks, such as Integrated Transport Network (ITN) supplied by Great Britain's Ordnance Survey.

It should be understood, however that network analysis makes a rather large assumption that each case has actually taken the quickest (least cost) route. The appropriateness of inferring travel routes may vary depending on the environment in which an outbreak occurs. For example, in an area with a dense network of roads, where there are a variety of routes that could be taken, inferring travel routes could result in a large degree of error. Conversely, in an area with a low density road network, where a case can perhaps only have one realistic route of travel between two locations, then the technique could be of more value. Mode of transport should also be considered when making a decision whether or not to infer travel routes. For example, in an inner city with an underground train network, it may be more likely for a case to travel by train from point A to B rather than by road. The rules used to create the inferred path must be explicitly stated and understood/communicated to those receiving the information, with override options if local expert knowledge is available. Inferring travel routes using network analysis is a potentially very useful resource; however an understanding of when to apply the technique is essential. It is also important to record the method of data capture within the attribution so that it is clearly understood by those working with the data.

Considerations for cross border outbreak: an outbreak local to an individual country should normally have access to individual-based case data (e.g. home location, locations visited and travel routes). In a cross border outbreak, due to legal restrictions, it may not be possible to share individual-based case data. In these situations case data should be aggregated to small-area administrative units used across European states, such as the 'Local Administrative Unit 2' (LAU2, formerly known as NUTS5). For each small-area, attribution should include the number of cases who are resident within that area, the number of locations visited by cases in that area, the number of times cases have travelled through that area, the number of the total cases in the outbreak that have been within that area, and a resident population figure for the administrative unit (if available). As well as including 'total' values it may also be beneficial to include values for each date being considered in the outbreak to allow the data to be viewed in time-series. A cut-down version of individual-based case data should also be possible to share - removing the coordinate locations of case home locations, work locations and other locations visited, and replacing them instead with the small-area geography code. By doing this the disclosive nature of point-based data would be removed but data could still be looked at in time series in relation to where, approximately (i.e. with a small-area geographic unit), each case had been over the time period up to the onset of symptoms.

For several of the analyses described in section 1. Analytical methods for outbreak investigation aggregate data would not be suitable; however analyses can take place using individual-based case data, either side of a border, and the outputs combined (see section 1.3).

EXAMPLE SCHEMAS

Case Schema

This data is tabular in nature (not spatial) and contains generic data about each 'case'. The attribution provided in red should be considered as the minimum requirement. The attribution provided in green should be considered as desirable additions.

DATA FIELD	DESCRIPTION
CASE_ID (PK)	A unique identifier given to each case within the analysis
ONSET_DATE	The date of onset of symptoms
LD_TYPING	The cases Legionnaires' disease sequence-based type
AGE	The age of the patient

Case Locations Schema

The attribution provided in red should be considered as the minimum requirement. The attribution provided in green should be considered as desirable additions.

DATA FIELD	DESCRIPTION
GEOMETRY	The geometric point representing the location
LOCATION_ID (PK)	A unique identifier for each location
CASE_ID (FK)	A unique identifier given to each case within the analysis
X_COORD	X coordinate of the location visited
Y_COORD	Y coordinate of the location visited
ADMIN_CODE	Administrative code of the small-area geographical unit the location falls within (e.g. a LAU2 code)
LOCATION_TYPE	The type of location visited (e.g. home, work, supermarket etc)
DATE	The date the location was visited
TIME	The approximate time the location was visited
DURATION	The approximate duration of stay at the location
DATA_CAPTURE	The method used for capturing the data

Travel Routes Schema

The attribution provided in red should be considered as the minimum requirement. The attribution provided in green should be considered as desirable additions.

DATA FIELD	DESCRIPTION
GEOMETRY	The geometric polyline representing the travel route
ROUTE_ID (PK)	A unique identifier given to each route within the analysis
CASE_ID (FK)	A unique identifier given to each case within the analysis
ORGIN	The LOCATION_ID of the starting point of the journey
DESTINATION	The LOCATION_ID of the end point of the journey
TRANS_TYPE	The transportation method used (e.g. car, train, walk etc.)
DATE	The date the travel took place
TIME	The and approximate time the travel took place
DATA_CAPTURE	The method used for capturing the data

Legionnaires' disease outbreak investigation toolbox

2.1 Case Data