Legionnaires' Disease Outbreak Study protocol
Background to the investigation
Aims and objectives of the investigation
Action plan with timeline, identifying roles and responsibilities
Case definitions and inclusion and exclusion criteria
Hypothesis and theme of enquiry
Sample size estimations based on the main study hypothesis
Case control ratio and recruitment of controls
Methods of data collection and plan for management of non-responders
Data management plan
Analytical strategy, outlining process and intended outputs
After an Outbreak Control Team (OCT
) has been convened, and
before embarking on an analytical study, a draft analytical study protocol should be written and
circulated to the OCT
members for discussion soon after the
decision is made to proceed to an analytical study. This page describes a template study protocol
for investigating an outbreak of Legionnaires' Disease and details the epidemiological steps
required: from preliminary investigation, identifying and notifying cases; collecting and
analysing data; managing and controlling the outbreak to disseminating findings and follow-up.
Each outbreak will be unique, but there are approaches to conducting an outbreak investigation
that are common to all.
This protocol is designed such that bold text describes actions to be considered by the
outbreak control team investigators when conducting a study. Other text is designed to assist
the team by providing background information to the actions
1. Background of investigation
Briefly describe Legionella and legionellosis
Short introduction to the organism and the disease.
Record initial events surrounding the outbreak
Information needed to give the outbreak context.
State how the diagnosis was verified
One of the first steps in outbreak investigation is to confirm the signs, symptoms and test
results of the patients that led to the diagnosis. Time and resources need to be spent
appropriately investigating real disease clusters that are a threat to the health of the
State reasons for investigating the outbreak
Is the number of cases "unusually high", or is it similar to what might be the expected number
of cases for that population at that time of year. Define "unusually high".
Briefly describe outbreak by time, place and person: descriptive epidemiology
Initial information may also be obtained by interviewing the first few cases using a trawling questionnaire. The cases should be described by time, place and person.
Describe cases by time: plot an epidemic curve
This will help to clarify whether the outbreak is due to a point source outbreak or continued
exposure to an ongoing source. An epidemic curve represents graphically, when cases in an
outbreak occur over time. This is usually displayed as a form of histogram or bar chart.
Describe cases by person: including creating a line listing
Include demographic details (such as age distribution, proportions of males/females), clinical
features, and pre-existing co-morbidities and construct line listing etc.
Describe cases by place:
Description should include home and work address, travel history and places where exposure may
have occurred. Beds or ward movements in a hospital outbreak; maps in community outbreaks.
The most critical information aiding identification of the source of infection is a clear
history of exposure for the two-week period prior to the onset of illness, from the patient,
relatives or friends. The full address and postal code/ zip code of place of residence, place
of work and details of travel (with overnight stays) should be obtained. In addition, details
of visits to, or overnight stays in, hospital should be ascertained, as well as information on
other potential common sites and exposures to Legionella. These include exposure to
industrial or commercial wet cooling systems, whirlpool spas in domestic, leisure, retail or
commercial settings, and showers and respiratory equipment in hospital or domestic settings.
2. Aims and objectives of the investigation
State specific aims of this investigation
The fundamental aim of carrying out an analytical study is the possibility of detecting a
source of Legionella which would enable appropriate action to protect the health of the
public. However, each investigation will include aims and objectives unique to that outbreak.
Specific objectives will be to identify:
- Mode of transmission
- Population at risk
- Exposure causing disease (risk factors)
It is important to define the source population for the cases, i.e. the population at
risk of developing the disease. This will be the denominator of interest for any type of
epidemiological study that may be conducted during the investigation.
3. Action plan with timeline, identifying roles and responsibilities
It is important that the investigation is expedited so that any further harm to public health
is prevented. The outbreak investigation should be managed as a project. The draft analytical
study protocol should describe who would carry out specific tasks in a time bound manner.
Where an analytical study is thought necessary, the necessary resources must be discussed and
agreed at an early stage. The impact upon critical functions and arrangements for maintaining
resilience should be considered and planned for. Consultation or collaboration with specialist
divisions or with external centres of expertise is also advisable where information on
microbiological, environmental or other highly technical matters is sought.
4. Case definitions and inclusion and exclusion criteria
Clearly state case definition(s)
Having a case definition allows investigators to identify cases and standardise the
investigation by having clear criteria to determine whether an individual should be classified
as being a case - and therefore whether they are part of the outbreak. A case definition is
unique for every outbreak situation and may change as the outbreak progresses, but should
always be evidenced based, see here for template definitions.
Examples of case definitions from published outbreak investigations can be found here.
State the inclusion and exclusion criteria for cases and controls
Consider how the definition you choose may impact upon your study results: perhaps it may be
appropriate to exclude certain individuals from the study if confounding
factors are thought to have a disproportionate role on specific individuals/groups.
(i.e. factors that affect the outcome, but are not affected by the exposure and account for all
or part of an apparent association between the exposure and outcome - such as occupation or
Once a case definition is developed, it will be used to identify and count cases in the
population at risk of developing the disease.
5. Hypothesis and theme of enquiry
The hypothesis will be taken from the outputs of the descriptive epidemiology of the outbreak.
These outputs are organised in a line listing.
Generate hypotheses about the outbreak cause
This will be done by reviewing the information available (on the outbreak so far and in the
literature) and often by administering an initial open-ended hypothesis-generating
questionnaire to some of the case-patients to attempt to learn about potential exposures
("a trawling questionnaire").
Create a hypothesis-generating questionnaire/interview
Results from hypothesis-generating interviews (trawling questionnaires) give initial clues as
to possible sources of exposure and can help develop or refine the case definition, which can
aid deciding who to include (or not) as the investigation proceeds. From
hypothesis-generating interviews, a demographic profile can be developed that helps
identify the population at risk. By finding common factors among the case-patients, you can
begin to develop a list of possible exposures. For example, case-patients may have all visited
the same spa, or shopping centre or all live in the vicinity of a specific cooling tower.
N.B. Reviewing the medical, epidemiology and microbiology literature and talking to
other experts in the field to learn about previous similar outbreaks can provide valuable
insight into the potential exposure(s).
Specific hypotheses developed from the initial trawling questionnaire can be included into a
second more detailed and structured hypothesis-testing questionnaire. The results from
this second questionnaire can be used to test the hypotheses in an analytic
Click here to download an example EpiInfo
State your study hypothesis
The hypothesis(es) behind the study should be stated explicitly by mentioning the null
hypothesis and the alternate hypothesis. (A statistical hypothesis that one variable has no
association with another variable or set of variables. The null hypothesis states that the
differences observed in a study or test occurred as a result of the operation of chance alone.)
6. Sample size estimations based on the main study hypothesis
Before a study begins, the investigator needs to decide, mathematically, how many subjects
should be studied, though operationally a more pragmatic decision can be made depending on
available resources at the cost of statistical power. Explicit calculation should be attempted,
in spite of resources, so that assessment can be made of the analytic study by outside
reviewers, if eventual publication is intended.
Calculate sample size for cohort study
For a cohort study it is often not necessary to carry out a sample size calculation as one is
usually limited by the size of the cohort. But if there is a very large cohort and a
representative sample is being used, then it would be useful to carry out a sample size
Calculate sample size for case-control study
When conducting a case-control study, a sample size calculation should ideally be undertaken.
This will allow the researcher to calculate the required sample size in order to ensure that
the study has sufficient power to detect a significant association (if present) between the
risk of becoming a case of legionellosis and a range of risk factors such as 'swimming', 'hotel
visit' or 'visit to town centre'. While the number of cases may be fixed, the sample size
calculation will guide recruitment of controls. It is not always possible to achieve the
desirable sample size.
Sample size can be calculated using software such as 'Epiinfo'.
A number of other factors, such as response rates to questionnaires or the need to adjust for
confounding (see section 4) at the analysis stage, may also influence the estimated sample
size. If the estimated sample size required to identify an odds ratio of 3 is not thought to be
feasible, then a study may not be worthwhile. However, it should be considered that a
non-significant result may sometimes be helpful for focussing further microbiological or
environmental investigations for or against a suspected source and that it depends both on the
strength of the association and the width of the CI, rather than on a predefined 95%
7. Case control ratio and recruitment of controls
The numbers of cases and controls in a case control study depends upon:
- Availability of cases who would volunteer to participate in the study
- Availability of controls which are representative of the population from which the cases
arise and are not cases
- Sample size requirement
Controls may be selected from:
- Local community
- Case nominated
- GP/Health records
- Family members
- Laboratory records
- Health Authority telephone records
- Local schools
- Immunisation register
- Other users of the leisure facility
- Visitor list
- Random selection from club membership list
- Other town with similar characteristics
- Electoral roll
- Random telephone numbers
Consider case: control ratio
In general, it is advisable to recruit 2 to 3 controls per case from the beginning of the
outbreak and review this approach as the outbreak progresses. A study with <10 cases is
unlikely to provide a statistically significant result, though operationally this might still
be useful if resources allow.
Where the number of cases is small, consideration should be given to increasing the ratio of
controls to cases, up to 4 controls per case. Increasing the ratio above 4 controls per case is
unlikely to increase power substantially.
Consider a matched study design
A matched design (where cases and controls are made similar or identical with respect to their
distribution of extraneous factors) may occasionally be required where a major potential
confounder (such as age or occupation) is thought to exist. It is generally preferable to
adjust for confounding at the analysis stage, but if the sample size is small this may not be
possible and matching may need to be considered. For these reasons the decision to match should
be carefully considered. Matching can complicate study design and interpretation in a number of
- Complicating the identification of controls
- Case data may need to be excluded where data are missing from the control in a matched pair
8. Methods of data collection, quality control and plan for
management of non-responders
Consider the logistics of the outbreak investigation
When the hypothesis-testing questionnaire has been developed and the study design has
been selected, the logistics of conducting the investigation should be further considered,
including the following:
Pre-test the questionnaire
If possible, allow questionnaire to be reviewed by other researchers; train data collectors
(interviewers) and pre-test on persons similar to study subjects. This helps to identify
problems through feedback from pre-test subjects and data collectors and monitoring interviews,
so changes can be made ahead of time.
Train/brief data collectors (interviewers)
Train data collectors in the study procedures so that they are familiar with the study, and the
questionnaire and how to deliver the interview. This will help to make sure that interviews are
conducted in a standardised way. Ensure that the importance of achieving complete, accurate and
high quality data is understood. Although outbreak investigations are time critical,
interviewer training is a crucial component that should not be left out, especially in a
situation where there are inexperienced interviewers or several interviewers are involved.
How will the questionnaire be administered?
A feasible method for administering and distributing the questionnaire should be discussed:
self-administered/personal interview; in person/by phone/by mail/by electronic mail/via the
Determine methods of data entry and data coding
The data entry programme or spreadsheet and method of entering data into the program should be
9. Draft questionnaire
Provide a draft hypothesis-testing questionnaire
Click here to download an example EpiInfo
An example of a hypothesis-testing questionnaire and interview script is provided, which can be modified for use in an incident
relating to possible sources of exposure to Legionella , whether data collection is by
face-to-face interview, telephone interview or self-administered. Not all of the questions will
necessarily be relevant to a particular incident. Additional questions may also need to be
inserted and the questionnaire be modified accordingly.
Standardise interview script and data recording
It is strongly recommended that interviewers are provided with a standard script if the
questionnaire is completed by a face-to-face or a telephone interview. The interviewers should
also have a list of frequently asked questions and information on
Legionella. If the questionnaire is distributed online or by post, supporting
information should be provided along with contact details should there be any queries.
10. Data management plan
Enter data into suitable database
Data should be entered into a suitable database depending on which software is to be used for
analysis. Click here to download an example EpiInfo
Describe how data will be checked for errors and data quality ensured
Errors may be introduced into the data at any stage of data collection, data entry or data
analysis, and checking should take place at each stage. There are three main ways of reducing
data entry errors and maintaining data quality at the data entry stage: interactive checking,
double data entry and batch checking. However, none of these approaches can guarantee the
identification of all data entry errors.
Double data entry (where the data is entered twice, ideally by two different people,
with the two data sets then compared using verification software) is the gold standard, but
may be impractical in an outbreak setting. Interactive checking identifies errors or
anomalies in the data as they are entered, and can detect range errors (e.g. an age of
176) or consistency errors (e.g. a pregnant male).
Interactive checking is best used when data collection proceeds in parallel with data
entry, and anomalies in the data can quickly be queried from the data source. However,
interactive checking interrupts data entry, and so batch checking, where checks are made on
the data after all the data are entered, or periodically during data entry, may be preferred.
Interim analyses of the data, such as basic tabulations and plots, can identify
further errors in the data. Where errors are corrected, it is important to maintain an audit
trail of changes made to the data. One way of doing this is to leave the original data
untouched and to correct errors programmatically at the time of analysis. If it is not
possible to correct these errors, then it may be necessary to set their values to missing.
11. Ethical considerations
Personal identifiable data obtained during the course of a study should be protected in such
as way as to render its disclosure to the detriment of the subject an extremely remote
Describe how data confidentiality will be ensured
- Every record entered into the database has a unique identifier, which must be entered with
- Personal identifying information such as name or address does not need to be entered into
the study database (although names can alternatively be anonymised using Soundex codes) but
should be stored separately and securely along with the linking database identifier to allow
subjects to be linked with their records if required to correct errors.
- Data should be stored securely, with back-ups made at appropriate intervals.
- Data should be stored and transferred using an appropriate secure method that is compliant
with organisational and legal guidance on confidentiality and security of data and information.
Consent must be obtained from each subject for their participation in epidemiological
Describe how informed consent will be obtained
Detail information to be provided allowing a subject to make an informed decision to
- Explanation of the purpose of the study and procedures to be followed (terms should be
clear and easily understandable)
- A description of any discomfort and possible hazards involved
- An accurate statement of how much of the subject's time will be needed
- A description of the potential benefits to them and to society
- A statement that they are free to withdraw from the study at any time
- A statement, when relevant, that their future interests will not be prejudiced in any way
by refusal to participate
- An offer to answer any questions that they may have
However, if an analytical study only involves participation in an interview or completion of a
self-administered questionnaire this can be seen as 'minimal risk' to the subject, and if a
subject participates then they have consented to the procedure so 'fully informed' and
'written' consent provisions may potentially be waived provided that subjects are given any
important information after their participation has ended.
Waivering of written consent is also important in situations where it is problematic to obtain,
such as in telephone interviews. Written consent may also reduce the study participation rate.
However, the draft analytical study protocol must still specify what information will be
provided and how it will be given to subjects when written consent is not going to be obtained.
12. Analytical strategy, outlining process and intended outputs
Once the data are entered and cleaned, aim to answer some or all of the following
- What is the size and time course of the outbreak so far?
- What are the demographics and other characteristics of the cases so far, and what does this
suggest about the population at risk?
- What are the clinical features and the outcomes of the cases at this point in time?
- What factors are associated with disease?
- Are any associations real, artefactual, confounded or due to chance?
- What do the findings suggest about the likely source?
- Are the data consistent with the hypothesis developed from the descriptive
Important steps to consider in analysing outbreak data include: (see
here for help using EpiData)
- Re-evaluate the case definition (may be an iterative process) and ensure that persons
classified as cases or controls are eligible for inclusion
- Familiarise yourself with the data by examining the distribution of each individual
- Categorical variables can be examined as frequency tables or bar charts.
- Quantitative variables can be examined by computing numerical summaries (such as mean
and standard deviation, or median and interquartile range) or by histograms and box plots.
- Identify how much data is missing for each variable.
Orient the data in time
- Update any epidemic curves previously plotted.
- Where possible, compute the median and range for the estimated incubation and recovery
Orient the data in terms of person characteristics
- Demographics of cases and controls.
- Clinical features of cases and controls.
- Outcomes of cases.
Each risk factor is examined individually for a possible association with the outcome. Given
that there is a 5% chance of each univariate analysis falsely demonstrating an association, the
more risk factors that are studied; the less likely it is that any associations observed are
- If the study design was a retrospective cohort study, calculate the overall attack rate,
risk factor specific attack rates, and relative risks.
- If the study design was a case control study, calculate the risk factor specific odds
- Test the null hypothesis of no association for each relationship of interest.
- The chi-square test (or Fisher's exact test) [for detecting whether two or more population
distributions differ from one another] are commonly used methods.
- Where evidence is found for an association, calculate 95% confidence intervals for the
observed measure of effect.
Consider adjusting for the effect of confounding or related issues
Confounding refers to the influence of a third (or more) variable on the observed
exposure-disease association (see section 4). Specialist advice may be required to account for
this in the analysis. Methods which can be used include:
- Stratified analysis, which examines the outcome in relation to two possible risk factors
- Multivariate regression, which examines the outcome in relation to several possible risk
factors (examples include logistic regression, Poisson regression, Cox regression)
Interpret and evaluate the results
The measures of effect (relative risks or odds ratios), after adjustment for confounding if
required, then need to be interpreted for the support they give to the hypothesis or hypotheses
under investigation. If, for a possible risk factor, the measure of effect is not significant
at ~ the 5% level (P-value >0.05), then we conclude that the data does not provide evidence
of an association. If the measure of effect is significant at the 5% level, we usually conclude
that the data does provide evidence of an association between this risk factor and
Determine causality (if possible)
To judge whether this association may be a causal association, we need further
information. The stronger the association, i.e. the larger the measure of effect, (a
risk ratio >2 can be considered strong), the more likely the association is to be causal.
Consider also whether a causal association between the risk factor and the outcome is
biologically plausible. The essential criterion in determining causality is having the correct
temporal relationship - does cause precede effect?
Consider sources of bias
Bias refers to a whole range of possible errors in the design or conduct of the investigation
which may lead to an incorrect conclusion being drawn. Observational study designs such as case
control studies are prone to particular types of bias - both random and systematic, and bias
should always be considered in the interpretation of the results of the investigation of an
outbreak or incident. The probability of selection and recall bias is high in case-control
studies but low in cohort studies. Loss to follow-up is a high risk in cohort studies, but a
low risk in case-control. The probability of confounding is medium risk in both.
Evaluate epidemiological findings in context with environmental and clinical evidence
The results of the epidemiological study should also be considered in the light of the results
of the microbiological and environmental parts of the investigation. Careful development of
epidemiological inferences combined with environmental and clinical evidence may provide
convincing evidence of the source and mode of spread of legionellosis.
13. Report writing
Write (draft and) final outbreak report
This section provides a template for a final report and presents issues to be considered when
writing it. The eventual structure and detail in the report will vary from outbreak to outbreak
and will be resource and situation dependent; some sections may not always be relevant and
likewise other sections may need to be created. There are likely to be several drafts of the
final report as new information or events are revealed and reviewed.
Executive Summary: Brief description of the outbreak, the main findings and relevant
- When the outbreak occurred
- How the outbreak was discovered
- Where or what sites were implicated
- Number of cases investigated
- Brief description of Legionella
- Local epidemiology of legionellosis
- Investigation of the outbreak
- Chronology of key dates and events.
- Investigation of the outbreak
- Descriptive: Description of initial cases / case definition and hypothesis
generation / demographic characteristics / geographical distribution of cases /
- Analytical: case control and/or cohort studies.
- Inspection of premises
- Environmental sampling
- Risk assessment
- Process enquiry
- Staff interviews
- Possible sources of infection
- Local laboratories or reference laboratories involved
- Clinical, water and environmental samples
- Types of tests carried out
- Number of responses and participation rate (in total, and by cases and
- Number of cases (i.e. , met case-definition), and overall attack rate (for
- Symptoms of illness (table of symptoms & frequency in cases & non-cases)
- Duration of illness (median, range)
- Characteristics of cases and non-cases/controls: age (median, range, and by age
group), sex, status (e.g. , guests/staff, etc.), ethnicity (if relevant). These
data may most informatively be expressed in tables, including attack rates
- Outcomes of illnesses: hospitalisations, deaths, lasting effects
- Incubation period (including median and range). It is usually useful to graph the
- Relationship of exposures to illnesses: Table showing attack rates, risk ratios,
odds ratios (as appropriate to study design), confidence intervals, and p-values
- Observations during the site visit
- Laboratory results including species identification, cross matching environmental
versus clinical isolates where appropriate and serology results
- Control measures
- Overall co-ordination and management of the outbreak
- Care of cases
- Prevention of further cases
- Outline of enforcement action
- Communication and media
- Brief information / description regarding communication throughout the investigation,
both internal and external to all organisations involved
- Details of which organisation took the lead for communications with the media
- Discussion and conclusion
- Risk factors
- Likely source
- Impact of bias and confounding on results
- Comparison with other outbreaks of legionellosis
- Efforts to control this and prevent further outbreaks
- Lessons learned and recommendations
- What should be done to control this outbreak
- What should be done to prevent future outbreaks
- What should be done to improve investigation of outbreaks in future
- This may vary depending on the audience for the report and may include the protocol or
questionnaire used for the analytical study.
 From Armstrong , White and Saracci "Principles of Exposure Measurement
- ARMSTRONG B.K., WHITE E. & SARACCI R. (1992) Principles of Exposure Measurement in
Epidemiology ISBN 019262020 Oxford University Press http