Legionnaires' disease outbreak investigation toolbox

Download Page

Legionnaires' Disease Outbreak Study protocol


  1. Background to the investigation
  2. Aims and objectives of the investigation
  3. Action plan with timeline, identifying roles and responsibilities
  4. Case definitions and inclusion and exclusion criteria
  5. Hypothesis and theme of enquiry
  6. Sample size estimations based on the main study hypothesis
  7. Case control ratio and recruitment of controls
  8. Methods of data collection and plan for management of non-responders
  9. Draft questionnaire
  10. Data management plan
  11. Ethical considerations
  12. Analytical strategy, outlining process and intended outputs
  13. Report writing

After an Outbreak Control Team (OCT) has been convened, and before embarking on an analytical study, a draft analytical study protocol should be written and circulated to the OCT members for discussion soon after the decision is made to proceed to an analytical study. This page describes a template study protocol for investigating an outbreak of Legionnaires' Disease and details the epidemiological steps required: from preliminary investigation, identifying and notifying cases; collecting and analysing data; managing and controlling the outbreak to disseminating findings and follow-up. Each outbreak will be unique, but there are approaches to conducting an outbreak investigation that are common to all.

This protocol is designed such that bold text describes actions to be considered by the outbreak control team investigators when conducting a study. Other text is designed to assist the team by providing background information to the actions

1. Background of investigation

Briefly describe Legionella and legionellosis

Short introduction to the organism and the disease.

Record initial events surrounding the outbreak

Information needed to give the outbreak context.

State how the diagnosis was verified

One of the first steps in outbreak investigation is to confirm the signs, symptoms and test results of the patients that led to the diagnosis. Time and resources need to be spent appropriately investigating real disease clusters that are a threat to the health of the public.

State reasons for investigating the outbreak

Is the number of cases "unusually high", or is it similar to what might be the expected number of cases for that population at that time of year. Define "unusually high".

Briefly describe outbreak by time, place and person: descriptive epidemiology

Initial information may also be obtained by interviewing the first few cases using a trawling questionnaire. The cases should be described by time, place and person.

Describe cases by time: plot an epidemic curve

This will help to clarify whether the outbreak is due to a point source outbreak or continued exposure to an ongoing source. An epidemic curve represents graphically, when cases in an outbreak occur over time. This is usually displayed as a form of histogram or bar chart.

Describe cases by person: including creating a line listing

Include demographic details (such as age distribution, proportions of males/females), clinical features, and pre-existing co-morbidities and construct line listing etc.

Describe cases by place:

Description should include home and work address, travel history and places where exposure may have occurred. Beds or ward movements in a hospital outbreak; maps in community outbreaks.

The most critical information aiding identification of the source of infection is a clear history of exposure for the two-week period prior to the onset of illness, from the patient, relatives or friends. The full address and postal code/ zip code of place of residence, place of work and details of travel (with overnight stays) should be obtained. In addition, details of visits to, or overnight stays in, hospital should be ascertained, as well as information on other potential common sites and exposures to Legionella. These include exposure to industrial or commercial wet cooling systems, whirlpool spas in domestic, leisure, retail or commercial settings, and showers and respiratory equipment in hospital or domestic settings.

2. Aims and objectives of the investigation

State specific aims of this investigation

The fundamental aim of carrying out an analytical study is the possibility of detecting a source of Legionella which would enable appropriate action to protect the health of the public. However, each investigation will include aims and objectives unique to that outbreak. Specific objectives will be to identify:

  • Mode of transmission
  • Source
  • Population at risk
  • Exposure causing disease (risk factors)

It is important to define the source population for the cases, i.e. the population at risk of developing the disease. This will be the denominator of interest for any type of epidemiological study that may be conducted during the investigation.

3. Action plan with timeline, identifying roles and responsibilities

It is important that the investigation is expedited so that any further harm to public health is prevented. The outbreak investigation should be managed as a project. The draft analytical study protocol should describe who would carry out specific tasks in a time bound manner.


Staff involved

By when

Study design/planning

Data collection

Data entry

Data interpretation

Report writing

External expertise

Where an analytical study is thought necessary, the necessary resources must be discussed and agreed at an early stage. The impact upon critical functions and arrangements for maintaining resilience should be considered and planned for. Consultation or collaboration with specialist divisions or with external centres of expertise is also advisable where information on microbiological, environmental or other highly technical matters is sought.

4. Case definitions and inclusion and exclusion criteria

Clearly state case definition(s)

Having a case definition allows investigators to identify cases and standardise the investigation by having clear criteria to determine whether an individual should be classified as being a case - and therefore whether they are part of the outbreak. A case definition is unique for every outbreak situation and may change as the outbreak progresses, but should always be evidenced based, see here for template definitions.

Examples of case definitions from published outbreak investigations can be found here.

State the inclusion and exclusion criteria for cases and controls

Consider how the definition you choose may impact upon your study results: perhaps it may be appropriate to exclude certain individuals from the study if confounding factors are thought to have a disproportionate role on specific individuals/groups. (i.e. factors that affect the outcome, but are not affected by the exposure and account for all or part of an apparent association between the exposure and outcome - such as occupation or lifestyle)

Case finding

Once a case definition is developed, it will be used to identify and count cases in the population at risk of developing the disease.

5. Hypothesis and theme of enquiry

The hypothesis will be taken from the outputs of the descriptive epidemiology of the outbreak. These outputs are organised in a line listing.

Generate hypotheses about the outbreak cause

This will be done by reviewing the information available (on the outbreak so far and in the literature) and often by administering an initial open-ended hypothesis-generating questionnaire to some of the case-patients to attempt to learn about potential exposures ("a trawling questionnaire").

Create a hypothesis-generating questionnaire/interview

Results from hypothesis-generating interviews (trawling questionnaires) give initial clues as to possible sources of exposure and can help develop or refine the case definition, which can aid deciding who to include (or not) as the investigation proceeds. From hypothesis-generating interviews, a demographic profile can be developed that helps identify the population at risk. By finding common factors among the case-patients, you can begin to develop a list of possible exposures. For example, case-patients may have all visited the same spa, or shopping centre or all live in the vicinity of a specific cooling tower.

N.B. Reviewing the medical, epidemiology and microbiology literature and talking to other experts in the field to learn about previous similar outbreaks can provide valuable insight into the potential exposure(s).

Hypothesis testing

Specific hypotheses developed from the initial trawling questionnaire can be included into a second more detailed and structured hypothesis-testing questionnaire. The results from this second questionnaire can be used to test the hypotheses in an analytic epidemiological study.

Click here to download an example EpiInfo questionnaire

State your study hypothesis

The hypothesis(es) behind the study should be stated explicitly by mentioning the null hypothesis and the alternate hypothesis. (A statistical hypothesis that one variable has no association with another variable or set of variables. The null hypothesis states that the differences observed in a study or test occurred as a result of the operation of chance alone.)

6. Sample size estimations based on the main study hypothesis

Before a study begins, the investigator needs to decide, mathematically, how many subjects should be studied, though operationally a more pragmatic decision can be made depending on available resources at the cost of statistical power. Explicit calculation should be attempted, in spite of resources, so that assessment can be made of the analytic study by outside reviewers, if eventual publication is intended.

Calculate sample size for cohort study

For a cohort study it is often not necessary to carry out a sample size calculation as one is usually limited by the size of the cohort. But if there is a very large cohort and a representative sample is being used, then it would be useful to carry out a sample size calculation.

Calculate sample size for case-control study

When conducting a case-control study, a sample size calculation should ideally be undertaken. This will allow the researcher to calculate the required sample size in order to ensure that the study has sufficient power to detect a significant association (if present) between the risk of becoming a case of legionellosis and a range of risk factors such as 'swimming', 'hotel visit' or 'visit to town centre'. While the number of cases may be fixed, the sample size calculation will guide recruitment of controls. It is not always possible to achieve the desirable sample size.

Sample size can be calculated using software such as 'Epiinfo'.

A number of other factors, such as response rates to questionnaires or the need to adjust for confounding (see section 4) at the analysis stage, may also influence the estimated sample size. If the estimated sample size required to identify an odds ratio of 3 is not thought to be feasible, then a study may not be worthwhile. However, it should be considered that a non-significant result may sometimes be helpful for focussing further microbiological or environmental investigations for or against a suspected source and that it depends both on the strength of the association and the width of the CI, rather than on a predefined 95% significance.

7. Case control ratio and recruitment of controls

The numbers of cases and controls in a case control study depends upon:

  • Availability of cases who would volunteer to participate in the study
  • Availability of controls which are representative of the population from which the cases arise and are not cases
  • Sample size requirement

Select controls

Controls may be selected from:

  • Local community
    • Case nominated
    • GP/Health records
    • Family members
    • Laboratory records
    • Health Authority telephone records
    • Local schools
    • Immunisation register
  • Other users of the leisure facility
    • Visitor list
    • Random selection from club membership list
  • Other town with similar characteristics
    • Electoral roll
    • Random telephone numbers

Consider case: control ratio

In general, it is advisable to recruit 2 to 3 controls per case from the beginning of the outbreak and review this approach as the outbreak progresses. A study with <10 cases is unlikely to provide a statistically significant result, though operationally this might still be useful if resources allow.

Where the number of cases is small, consideration should be given to increasing the ratio of controls to cases, up to 4 controls per case. Increasing the ratio above 4 controls per case is unlikely to increase power substantially.

Consider a matched study design

A matched design (where cases and controls are made similar or identical with respect to their distribution of extraneous factors) may occasionally be required where a major potential confounder (such as age or occupation) is thought to exist. It is generally preferable to adjust for confounding at the analysis stage, but if the sample size is small this may not be possible and matching may need to be considered. For these reasons the decision to match should be carefully considered. Matching can complicate study design and interpretation in a number of ways:

  • Complicating the identification of controls
  • Case data may need to be excluded where data are missing from the control in a matched pair

8. Methods of data collection, quality control and plan for management of non-responders

Consider the logistics of the outbreak investigation

When the hypothesis-testing questionnaire has been developed and the study design has been selected, the logistics of conducting the investigation should be further considered, including the following:

Pre-test the questionnaire

If possible, allow questionnaire to be reviewed by other researchers; train data collectors (interviewers) and pre-test on persons similar to study subjects. This helps to identify problems through feedback from pre-test subjects and data collectors and monitoring interviews, so changes can be made ahead of time.

Train/brief data collectors (interviewers)

Train data collectors in the study procedures so that they are familiar with the study, and the questionnaire and how to deliver the interview. This will help to make sure that interviews are conducted in a standardised way. Ensure that the importance of achieving complete, accurate and high quality data is understood. Although outbreak investigations are time critical, interviewer training is a crucial component that should not be left out, especially in a situation where there are inexperienced interviewers or several interviewers are involved.

How will the questionnaire be administered?

A feasible method for administering and distributing the questionnaire should be discussed: self-administered/personal interview; in person/by phone/by mail/by electronic mail/via the Internet.

Determine methods of data entry and data coding

The data entry programme or spreadsheet and method of entering data into the program should be considered.

9. Draft questionnaire

Provide a draft hypothesis-testing questionnaire

Click here to download an example EpiInfo questionnaire

An example of a hypothesis-testing questionnaire and interview script is provided, which can be modified for use in an incident relating to possible sources of exposure to Legionella , whether data collection is by face-to-face interview, telephone interview or self-administered. Not all of the questions will necessarily be relevant to a particular incident. Additional questions may also need to be inserted and the questionnaire be modified accordingly.

Standardise interview script and data recording

It is strongly recommended that interviewers are provided with a standard script if the questionnaire is completed by a face-to-face or a telephone interview. The interviewers should also have a list of frequently asked questions and information on Legionella. If the questionnaire is distributed online or by post, supporting information should be provided along with contact details should there be any queries.

10. Data management plan

Enter data into suitable database

Data should be entered into a suitable database depending on which software is to be used for analysis. Click here to download an example EpiInfo questionnaire

Describe how data will be checked for errors and data quality ensured

Errors may be introduced into the data at any stage of data collection, data entry or data analysis, and checking should take place at each stage. There are three main ways of reducing data entry errors and maintaining data quality at the data entry stage: interactive checking, double data entry and batch checking. However, none of these approaches can guarantee the identification of all data entry errors.

  • Double data entry (where the data is entered twice, ideally by two different people, with the two data sets then compared using verification software) is the gold standard, but may be impractical in an outbreak setting. Interactive checking identifies errors or anomalies in the data as they are entered, and can detect range errors (e.g. an age of 176) or consistency errors (e.g. a pregnant male).
  • Interactive checking is best used when data collection proceeds in parallel with data entry, and anomalies in the data can quickly be queried from the data source. However, interactive checking interrupts data entry, and so batch checking, where checks are made on the data after all the data are entered, or periodically during data entry, may be preferred.
  • Interim analyses of the data, such as basic tabulations and plots, can identify further errors in the data. Where errors are corrected, it is important to maintain an audit trail of changes made to the data. One way of doing this is to leave the original data untouched and to correct errors programmatically at the time of analysis. If it is not possible to correct these errors, then it may be necessary to set their values to missing.

11. Ethical considerations

Data confidentiality

Personal identifiable data obtained during the course of a study should be protected in such as way as to render its disclosure to the detriment of the subject an extremely remote possibility [1].

Describe how data confidentiality will be ensured

  • Every record entered into the database has a unique identifier, which must be entered with the record.
  • Personal identifying information such as name or address does not need to be entered into the study database (although names can alternatively be anonymised using Soundex codes) but should be stored separately and securely along with the linking database identifier to allow subjects to be linked with their records if required to correct errors.
  • Data should be stored securely, with back-ups made at appropriate intervals.
  • Data should be stored and transferred using an appropriate secure method that is compliant with organisational and legal guidance on confidentiality and security of data and information.

Informed consent

Consent must be obtained from each subject for their participation in epidemiological research

Describe how informed consent will be obtained

Detail information to be provided allowing a subject to make an informed decision to consent:

  • Explanation of the purpose of the study and procedures to be followed (terms should be clear and easily understandable)
  • A description of any discomfort and possible hazards involved
  • An accurate statement of how much of the subject's time will be needed
  • A description of the potential benefits to them and to society
  • A statement that they are free to withdraw from the study at any time
  • A statement, when relevant, that their future interests will not be prejudiced in any way by refusal to participate
  • An offer to answer any questions that they may have

However, if an analytical study only involves participation in an interview or completion of a self-administered questionnaire this can be seen as 'minimal risk' to the subject, and if a subject participates then they have consented to the procedure so 'fully informed' and 'written' consent provisions may potentially be waived provided that subjects are given any important information after their participation has ended[1].

Waivering of written consent is also important in situations where it is problematic to obtain, such as in telephone interviews. Written consent may also reduce the study participation rate.

However, the draft analytical study protocol must still specify what information will be provided and how it will be given to subjects when written consent is not going to be obtained.

12. Analytical strategy, outlining process and intended outputs

Once the data are entered and cleaned, aim to answer some or all of the following questions:

  • What is the size and time course of the outbreak so far?
  • What are the demographics and other characteristics of the cases so far, and what does this suggest about the population at risk?
  • What are the clinical features and the outcomes of the cases at this point in time?
  • What factors are associated with disease?
    • Are any associations real, artefactual, confounded or due to chance?
    • What do the findings suggest about the likely source?
    • Are the data consistent with the hypothesis developed from the descriptive epidemiology?

Important steps to consider in analysing outbreak data include: (see here for help using EpiData)

  • Re-evaluate the case definition (may be an iterative process) and ensure that persons classified as cases or controls are eligible for inclusion
  • Familiarise yourself with the data by examining the distribution of each individual variable
    • Categorical variables can be examined as frequency tables or bar charts.
    • Quantitative variables can be examined by computing numerical summaries (such as mean and standard deviation, or median and interquartile range) or by histograms and box plots.
    • Identify how much data is missing for each variable.

Orient the data in time

  • Update any epidemic curves previously plotted.
  • Where possible, compute the median and range for the estimated incubation and recovery periods.

Orient the data in terms of person characteristics

  • Demographics of cases and controls.
  • Clinical features of cases and controls.
  • Outcomes of cases.

Analyse univariately

Each risk factor is examined individually for a possible association with the outcome. Given that there is a 5% chance of each univariate analysis falsely demonstrating an association, the more risk factors that are studied; the less likely it is that any associations observed are real.

  • If the study design was a retrospective cohort study, calculate the overall attack rate, risk factor specific attack rates, and relative risks.
  • If the study design was a case control study, calculate the risk factor specific odds ratios.
  • Test the null hypothesis of no association for each relationship of interest.
  • The chi-square test (or Fisher's exact test) [for detecting whether two or more population distributions differ from one another] are commonly used methods.
  • Where evidence is found for an association, calculate 95% confidence intervals for the observed measure of effect.

Consider adjusting for the effect of confounding or related issues

Confounding refers to the influence of a third (or more) variable on the observed exposure-disease association (see section 4). Specialist advice may be required to account for this in the analysis. Methods which can be used include:

  • Stratified analysis, which examines the outcome in relation to two possible risk factors
  • Multivariate regression, which examines the outcome in relation to several possible risk factors (examples include logistic regression, Poisson regression, Cox regression)

Interpret and evaluate the results

The measures of effect (relative risks or odds ratios), after adjustment for confounding if required, then need to be interpreted for the support they give to the hypothesis or hypotheses under investigation. If, for a possible risk factor, the measure of effect is not significant at ~ the 5% level (P-value >0.05), then we conclude that the data does not provide evidence of an association. If the measure of effect is significant at the 5% level, we usually conclude that the data does provide evidence of an association between this risk factor and disease.

Determine causality (if possible)

To judge whether this association may be a causal association, we need further information. The stronger the association, i.e. the larger the measure of effect, (a risk ratio >2 can be considered strong), the more likely the association is to be causal. Consider also whether a causal association between the risk factor and the outcome is biologically plausible. The essential criterion in determining causality is having the correct temporal relationship - does cause precede effect?

Consider sources of bias

Bias refers to a whole range of possible errors in the design or conduct of the investigation which may lead to an incorrect conclusion being drawn. Observational study designs such as case control studies are prone to particular types of bias - both random and systematic, and bias should always be considered in the interpretation of the results of the investigation of an outbreak or incident. The probability of selection and recall bias is high in case-control studies but low in cohort studies. Loss to follow-up is a high risk in cohort studies, but a low risk in case-control. The probability of confounding is medium risk in both.

Evaluate epidemiological findings in context with environmental and clinical evidence

The results of the epidemiological study should also be considered in the light of the results of the microbiological and environmental parts of the investigation. Careful development of epidemiological inferences combined with environmental and clinical evidence may provide convincing evidence of the source and mode of spread of legionellosis.

13. Report writing

Write (draft and) final outbreak report

This section provides a template for a final report and presents issues to be considered when writing it. The eventual structure and detail in the report will vary from outbreak to outbreak and will be resource and situation dependent; some sections may not always be relevant and likewise other sections may need to be created. There are likely to be several drafts of the final report as new information or events are revealed and reviewed.

Contents Page

Executive Summary: Brief description of the outbreak, the main findings and relevant recommendations

  1. Introduction
    • When the outbreak occurred
    • How the outbreak was discovered
    • Where or what sites were implicated
    • Number of cases investigated
  2. Background
    • Brief description of Legionella
    • Local epidemiology of legionellosis
    • Investigation of the outbreak
    • Chronology of key dates and events.
  3. Investigation of the outbreak
    1. Epidemiological
      • Descriptive: Description of initial cases / case definition and hypothesis generation / demographic characteristics / geographical distribution of cases / enhanced surveillance
      • Analytical: case control and/or cohort studies.
    2. Environmental
      • Inspection of premises
      • Environmental sampling
      • Risk assessment
      • Process enquiry
      • Staff interviews
      • Possible sources of infection
    3. Microbiological
      • Local laboratories or reference laboratories involved
      • Clinical, water and environmental samples
      • Types of tests carried out
  4. Results
    1. Epidemiological
      • Number of responses and participation rate (in total, and by cases and non-cases/controls)
      • Number of cases (i.e. , met case-definition), and overall attack rate (for cohort study)
      • Symptoms of illness (table of symptoms & frequency in cases & non-cases)
      • Duration of illness (median, range)
      • Characteristics of cases and non-cases/controls: age (median, range, and by age group), sex, status (e.g. , guests/staff, etc.), ethnicity (if relevant). These data may most informatively be expressed in tables, including attack rates
      • Outcomes of illnesses: hospitalisations, deaths, lasting effects
      • Incubation period (including median and range). It is usually useful to graph the 'epidemic curve'
      • Relationship of exposures to illnesses: Table showing attack rates, risk ratios, odds ratios (as appropriate to study design), confidence intervals, and p-values
    2. Environmental
      • Observations during the site visit
    3. Microbiological
      • Laboratory results including species identification, cross matching environmental versus clinical isolates where appropriate and serology results
  5. Control measures
    • Overall co-ordination and management of the outbreak
    • Care of cases
    • Prevention of further cases
    • Outline of enforcement action
  6. Communication and media
    • Brief information / description regarding communication throughout the investigation, both internal and external to all organisations involved
    • Details of which organisation took the lead for communications with the media
  7. Discussion and conclusion
    • Risk factors
    • Likely source
    • Impact of bias and confounding on results
    • Comparison with other outbreaks of legionellosis
    • Efforts to control this and prevent further outbreaks
  8. Lessons learned and recommendations
    • What should be done to control this outbreak
    • What should be done to prevent future outbreaks
    • What should be done to improve investigation of outbreaks in future
  9. Appendices
    • This may vary depending on the audience for the report and may include the protocol or questionnaire used for the analytical study.

[1] From Armstrong , White and Saracci "Principles of Exposure Measurement in Epidemiology"1992.

  1. ARMSTRONG B.K., WHITE E. & SARACCI R. (1992) Principles of Exposure Measurement in Epidemiology ISBN 019262020 Oxford University Press http